> For the complete documentation index, see [llms.txt](https://playbooks.equalexperts.com/mlops-playbook/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://playbooks.equalexperts.com/mlops-playbook/principles/solid-data-foundations.md).

# Solid data foundations

### <mark style="color:blue;">Have a store of good quality, ground-truth historical data that is accessible by your data scientists</mark>

A machine learning solution is fundamentally dependent on the data used to train it. To maintain and operate an ML solution, the data used to develop the model/algorithm must be available to the maintainers. They will need the data to monitor performance, validate continued performance and find improvements. Furthermore, in many cases the algorithm is modelling an external world that is undergoing change, and they will want to update or retrain the model to reflect these changes, so will need data updates.

The data needs to be accessible by data science teams and it will also need to be made available to automated processes that have been set-up for retraining the model.&#x20;

In most applications of ML, ground-truth data will need to be captured alongside the input data and it is essential to capture these data points as well.&#x20;

It is common to create data warehouses, data lakes or data lakehouses and associated data pipelines to store this data. Our data [pipelines playbook](https://playbooks.equalexperts.com/data-pipeline/) covers our approach to providing this data.

**The below diagram shows the two processes involved in building machine learning systems and the data they need to access:**

* An evaluation process that makes predictions (model scoring). This may be real-time.&#x20;
* A batch process that retrains the model, based on fresh historical data.

![](/files/HxEjy5icd8625GDZRTfC)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://playbooks.equalexperts.com/mlops-playbook/principles/solid-data-foundations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
