Why automate Machine Learning?
The automation of machine learning pipelines is highly correlated with the maturity of the project. There are several steps between developing a model and deploying it, and a good part of this process relies on experimentation.
Executing this workflow with less manual intervention as possible should result in:
- Fast deployments
- More reliable process
- Easier problem discovering
What can be automated?
In the table below, it's possible to find machine learning stages and what tasks may be automated:
|Data acquiring, validation, and processing
|Model training, evaluation, and testing
|Build and testing
|Deployment new implementation of a model as a service
|Setting alerts based on pre-defined metrics
Levels of automation
Defining the level of automation has a crucial impact on the business behind the project. For example: data scientists spend a good amount of time searching for good features, and this time can cost too much in resources which can impact directly the business return of investment(ROI).
MLOps.org describes a three-level of automation in machine learning projects:
- Manual Process: Full experimentation pipeline executed manually using Rapid Application Development(RAD) tools, like Jupyter Notebooks. Deployments are also executed manually.
- Machine Learning automation: Automation of the experimentation pipeline which includes data and model validation.
- CI/CD pipelines: Automatically build, test and deploy of ML models and ML training pipeline components, providing a fast and reliable deployment.
Common Tools for Machine Learning Pipeline Automation
Currently there are many tools used to create and automate Machine Learning pipelines, experiments and any type of process. Below there is a list of some of their services.
|DVC can be used to make Data Pipelines, which can be automated and reproduced. Very useful if already using DVC for Data and Model versioning. Easily configured and run. Language and framework agnostic.
|Tensorflow Extended (TFX)
|Used for production Machine Learning pipelines. Heavily integrated with Google and GCP. Only works with Tensorflow.
|Kubeflow can build automated pipelines and experiments. Intended to build a complete end-to-end solution for Machine learning, being able to also serve and monitor models. Uses Kubernetes and is based on Tensorflow Extended. Works with Tensorflow and Pytorch.
|Open-source platform for the machine learning lifecycle. Can be used with Python, Conda and Docker. Large community.