MLOps
Equal ExpertsContact UsPlaybooks
  • Overview
    • Key terms
  • What is MLOps
  • Principles
    • Solid data foundations
    • Provide an environment that allows data scientists to create and test models
    • A machine learning service is a product
    • Apply continuous delivery
    • Evaluate and monitor algorithms throughout their lifecycle
    • MLOps is a team effort
  • Practices
    • Collect performance data
    • Ways of deploying your model
    • How often do you deploy a model?
    • Keep a versioned model repository
    • Measure and proactively evaluate quality of training data
    • Testing through the ML pipeline
    • Business impact is more than just accuracy - understand your baseline
    • Regularly monitor your model in production
    • Monitor data quality
    • Automate the model lifecycle
    • Create a walking skeleton/steel thread
    • Appropriately optimise models for inference
  • Explore
  • Pitfalls (Avoid)
    • User Trust and Engagement
    • Explainability
    • Avoid notebooks in production
    • Poor security practices
    • Don’t treat accuracy as the only or even the best way to evaluate your algorithm
    • Use machine learning judiciously
    • Don’t forget to understand the at-inference usage profile
    • Don’t make it difficult for a data scientists to access data or use the tools they need
    • Not taking into consideration the downstream application of the model
  • Contributors
Powered by GitBook
On this page
  • There are two core aspects of monitoring for any ML Solution:
  • Experience report
Export as PDF
  1. Practices

Regularly monitor your model in production

PreviousBusiness impact is more than just accuracy - understand your baselineNextMonitor data quality

Last updated 3 years ago

There are two core aspects of monitoring for any ML Solution:

  • Monitoring as a software product

  • Monitoring model accuracy and performance

Realtime or embedded ML solutions need to be monitored for errors and performance just like any other software solution. With autogenerated ML solutions this becomes essential - model code may be generated that slows down predictions enough to cause timeouts and stop user transactions from processing.

Monitoring can be accomplished by using existing off the shelf tooling such as Prometheus and Graphite.

You would ideally monitor:

  • Availability

  • Request/Response timings

  • Throughput

  • Resource usage

Alerting should be set up across these metrics to catch issues before they become critical.

ML models are trained on data available at a certain point in time. Data drift or concept drift (see ?) can affect the performance of the model. So it’s important to monitor the live output of your models to ensure they are still accurate against new data as it arrives. This monitoring can drive when to retrain your models, and dashboards can give additional insight into seasonal events or data skew.

  • Precision/Recall/F1 Score.

  • Model score or outputs.

  • User feedback labels or downstream actions

  • Feature monitoring (Data Quality outputs such as histograms, variance, completeness).

Alerting should be set up on model accuracy metrics to catch any sudden regressions that may occur. This has been seen on projects where old models have suddenly failed against new data (fraud risking can become less accurate as new attack vectors are discovered), or an auto ML solution has generated buggy model code. Some ideas on alerting are:

  • % decrease in precision or recall.

  • variance change in model score or outputs.

  • changes in dependent user outputs e.g. number of search click throughs for a recommendation engine.

Experience report

For a fraud detection application, we adopted the usual best practices of cross validation training set with an auto-ML library for model selection. The auto-ML approach yielded a good performing model to start, albeit rather inscrutable for a fraud detection setting. Our primary objective was to build that path to live for the fraud scoring application. We followed up shortly thereafter with building model performance monitoring joining live out-of-sample scores with fraud outcomes based on precision, recall and f1 measures tracked in Grafana. Observability is vital to detect model regression - when the live performance degrades consistently below what the model achieved during training and validation.

It became clear that we were in an adversarial situation in which bad actors would change their attack patterns, which was reflected in data drift of the model inputs and consequent concept drift. The effort invested in developing the model pipeline and performance monitoring allowed us to detect this drift rapidly and quickly iterate with more interpretable models and better features. Data scientist

Equal Experts, South Africa

How often do you deploy a model
Austin Poulton