Regularly monitor your model in production

There are two core aspects of monitoring for any ML Solution:

  • Monitoring as a software product

  • Monitoring model accuracy and performance

Realtime or embedded ML solutions need to be monitored for errors and performance just like any other software solution. With autogenerated ML solutions this becomes essential - model code may be generated that slows down predictions enough to cause timeouts and stop user transactions from processing.

Monitoring can be accomplished by using existing off the shelf tooling such as Prometheus and Graphite.

You would ideally monitor:

  • Availability

  • Request/Response timings

  • Throughput

  • Resource usage

Alerting should be set up across these metrics to catch issues before they become critical.

ML models are trained on data available at a certain point in time. Data drift or concept drift (see How often do you deploy a model?) can affect the performance of the model. So it’s important to monitor the live output of your models to ensure they are still accurate against new data as it arrives. This monitoring can drive when to retrain your models, and dashboards can give additional insight into seasonal events or data skew.

  • Precision/Recall/F1 Score.

  • Model score or outputs.

  • User feedback labels or downstream actions

  • Feature monitoring (Data Quality outputs such as histograms, variance, completeness).

Alerting should be set up on model accuracy metrics to catch any sudden regressions that may occur. This has been seen on projects where old models have suddenly failed against new data (fraud risking can become less accurate as new attack vectors are discovered), or an auto ML solution has generated buggy model code. Some ideas on alerting are:

  • % decrease in precision or recall.

  • variance change in model score or outputs.

  • changes in dependent user outputs e.g. number of search click throughs for a recommendation engine.

Experience report

Last updated