MLOps
Equal ExpertsContact UsPlaybooks
  • Overview
    • Key terms
  • What is MLOps
  • Principles
    • Solid data foundations
    • Provide an environment that allows data scientists to create and test models
    • A machine learning service is a product
    • Apply continuous delivery
    • Evaluate and monitor algorithms throughout their lifecycle
    • MLOps is a team effort
  • Practices
    • Collect performance data
    • Ways of deploying your model
    • How often do you deploy a model?
    • Keep a versioned model repository
    • Measure and proactively evaluate quality of training data
    • Testing through the ML pipeline
    • Business impact is more than just accuracy - understand your baseline
    • Regularly monitor your model in production
    • Monitor data quality
    • Automate the model lifecycle
    • Create a walking skeleton/steel thread
    • Appropriately optimise models for inference
  • Explore
  • Pitfalls (Avoid)
    • User Trust and Engagement
    • Explainability
    • Avoid notebooks in production
    • Poor security practices
    • Don’t treat accuracy as the only or even the best way to evaluate your algorithm
    • Use machine learning judiciously
    • Don’t forget to understand the at-inference usage profile
    • Don’t make it difficult for a data scientists to access data or use the tools they need
    • Not taking into consideration the downstream application of the model
  • Contributors
Powered by GitBook
On this page
Export as PDF
  1. Practices

Measure and proactively evaluate quality of training data

ML models are only as good as the data they’re trained on.

In fact, the quality (and quantity) of training data is often a much bigger determiner of your ML project’s success than the sophistication of the model chosen. Or to put it another way: sometimes it pays much more to go and get better training data to use with simple models than to spend time finding better models that only use the data you have.

To do this deliberately and intentionally, you should be constantly evaluating the quality of training data.

You should try to:

  • Identify and address class imbalance (i.e. find ‘categories’ that are underrepresented).

  • Actively create more training data if you need it (buy it, crowdsource it, use techniques like image augmentation to derive more samples from the data you already have).

  • Identify statistical properties of variables in your data, and correlations between variables, to be able to identify outliers and any training samples that seem wrong.

  • Have processes (even manual random inspection!) that check for bad or mislabelled training samples. Visual inspection of samples by humans is a good simple technique for visual and audio data.

  • Verify that distributions of variables in your training data accurately reflect real life. Depending on the nature of your modelling, it’s also useful to understand when parts of your models rely on assumptions or beliefs (“priors”), for example the assumption that some variable has a certain statistical distribution. Test these beliefs against reality regularly, because reality changes!

  • Find classes of input that your model does badly on (and whose poor performance might be hidden by good overall “evaluation scores” that consider the whole of the test set). Try to supplement your training data with more samples from these categories to help improve performance.

  • Ideally, you should also be able to benchmark performance against your dataset rather than aim for getting metrics ‘as high as possible’. What is a reasonable expectation for accuracy at human level, or expert level? If human experts can only achieve 70% accuracy against the data you have, developing a model that achieves 75% accuracy is a terrific result! Having quantitative benchmarks against your data can allow you to know when to stop trying to find better models, and when to start shipping your product.

PreviousKeep a versioned model repositoryNextTesting through the ML pipeline

Last updated 3 years ago