MLOps
Equal ExpertsContact UsPlaybooks
  • Overview
    • Key terms
  • What is MLOps
  • Principles
    • Solid data foundations
    • Provide an environment that allows data scientists to create and test models
    • A machine learning service is a product
    • Apply continuous delivery
    • Evaluate and monitor algorithms throughout their lifecycle
    • MLOps is a team effort
  • Practices
    • Collect performance data
    • Ways of deploying your model
    • How often do you deploy a model?
    • Keep a versioned model repository
    • Measure and proactively evaluate quality of training data
    • Testing through the ML pipeline
    • Business impact is more than just accuracy - understand your baseline
    • Regularly monitor your model in production
    • Monitor data quality
    • Automate the model lifecycle
    • Create a walking skeleton/steel thread
    • Appropriately optimise models for inference
  • Explore
  • Pitfalls (Avoid)
    • User Trust and Engagement
    • Explainability
    • Avoid notebooks in production
    • Poor security practices
    • Don’t treat accuracy as the only or even the best way to evaluate your algorithm
    • Use machine learning judiciously
    • Don’t forget to understand the at-inference usage profile
    • Don’t make it difficult for a data scientists to access data or use the tools they need
    • Not taking into consideration the downstream application of the model
  • Contributors
Powered by GitBook
On this page
  • Turning a model from a prototype to an integrated part of the business requires a cross-functional team working closely together. You will need:
  • Experience report
  • Experience Report
Export as PDF
  1. Principles

MLOps is a team effort

PreviousEvaluate and monitor algorithms throughout their lifecycleNextPractices

Last updated 2 years ago

Turning a model from a prototype to an integrated part of the business requires a cross-functional team working closely together. You will need:

  • Platform/Machine Learning engineer(s) to provide the environment to host the model.

  • Data engineers to create the production data pipelines to retrain the model.

  • Data scientists to create and amend the model.

  • Software engineers to integrate the model into business systems (e.g. a webpage calling a model hosted as a microservice)

MLOps is easier if everyone has an idea of the concerns of the others. Data Scientists are typically strong at mathematics and statistics, and may not have strong software development skills. They are focused on algorithm performance and accuracy metrics. The various engineering disciplines are more concerned about testing, configuration control, logging, modularisation and paths to production (to name a few).

It is helpful if the engineers can provide clear ways of working to the data scientist early in the project. It will make it easier for the data scientists to deliver their models to them. How do they want the model/algorithm code delivered (probably not as a notebook)? What coding standards should they adhere to? How do you want them to log? What tests do you expect? Create a simple document and spend a session taking them through the development process that you have chosen. Engineers should recognise that the most pressing concern for data scientists is prototyping, experimentation and algorithm performance evaluation.

When the team forms, recognise that it is one team and organise yourself accordingly. Backlogs and stand-ups should be owned by and include the whole team.

Experience report

I started as a data scientist but quickly realised that if I wanted my work to be used I would need to take more interest in how models are deployed and used in production, which has led me to move into data engineering and ML Operations, and now this has become my passion! There are many things that I have learned during this transition.

In general, models are developed by data scientists. They have the maths and stats skills to understand the data and figure out which algorithms to use, whilst the data engineers deploy the models. New features can get added by either of these groups.

In my experience, data scientists usually need to improve their software development practices. They need to become familiar with the separation of environments (e.g. development, staging, production) and how code is promoted between these environments. I’m not saying they should become devops experts, but algorithms are software and if the code is bad or if it can’t be understood then it can’t be deployed or improved. Try to get your code out of the notebook early, and don’t wait for perfection before thinking about deployment. The more you delay moving into production, the more you end up with a bunch of notebooks that you don’t understand. Right now I’m working with a great data scientist and she follows the best practice of developing the code in Jupyter Notebooks, and then extracts the key functionality into libraries which can be easily deployed.

Equal Experts, EU

Experience Report

As the lead data scientist in a recent project, my role was to create an algorithm to estimate prices for used vehicles. There was an intense initial period where I had to understand the raw data, prototype the data pipelines and then create and evaluate different algorithms for pricing. It was a really intense time and my focus was very much on data cleaning, exploration and maths for the models.

For data engineers - find time to pair with data scientists and share best dev practices with them. Recognise that data science code is weird in many respects - lots of stuff is done with Data Frames or similar structures, and will look strange compared to traditional application programming. It will probably be an easier experience working with the data scientists if you understand that they will be familiar with the latest libraries and papers in Machine Learning, but not with the latest software dev practices. They should look to you to provide guidance on this - try to provide it in a way that recognises their expertise! Data specialist

We worked as a cross-functional team with a data engineer, UX designer and two user interface developers. Wehad shared stand-ups; and the data engineering, machine learning and user experience were worked on in parallel. I worked closely with our data engineer to develop the best way to deploy the ETL and model training scripts as data pipelines and APIs. He also created a great CI/CD environment and set up the formal ways of working in this environment, including how code changes in git should be handled and other coding practices to adopt. He paired with me on some of the initial deployments so I got up to speed quickly in creating production-ready code. As a data scientist I know there are 100 different ways someone can set up the build process - and I honestly don’t have any opinion on which is the right way. I care about the performance of my model! I really appreciated working together on it - that initial pairing meant that we were able to bring all our work together very quickly and has supported the iteration of the tool since then. Data scientist

Equal Experts, UK

Matteo Guzzo
Adam Fletcher