Explore
Our playbooks are collections of observations that we have made many times in different sectors and clients. However, there are some emerging technologies and approaches which we have only applied in one or two places to date, but which we think are really promising. We think they will become recommended practices in the future - or are at least worth experimenting with. For now we are recommending you explore them at least.
Feature stores
Data is central to any ML system - it’s needed both online and offline, for exploration and realtime prediction. One of the challenges in operationalising any ML algorithm is ensuring that any data used to train the model is also available in production. It is not simply the raw data that is used by the model - in most cases the raw data needs to be transformed in some way to create a data feature. (See Provide an Environment which Allows Data Scientists to create and test models for a description of Features and Feature Engineering.)
Creating a feature can be a time-consuming activity and you need it to be available for both offline and online activities. Furthermore, a feature you have created for one purpose may well be relevant for another task. A feature store is a component that manages the ingestion of raw data (from databases, event streams etc.) and turns it into features which can be used both to train models and as an input to the operational model. It takes the place of the data warehouse and the operational data pipelines - providing a batch API or query mechanism for retrieval of feature data-sets for model training, as well as a low latency API to provide data for real-time predictions.
The benefits are that:
You do not need to create a separate data pipeline for the online inference
Exactly the same transforms are used for training as for online inference
Experience report
Last updated