Project Workflow
In this section, we present a workflow example on how to create a new experiment in the project. This is also a step-by-step guide to the video presented on the home page.
Note
If you don't have a configured project already, go check out the [**Implementation Guide**](../Structure/project_structure/) first.
Setup your environment
Cloning your repository and installing dependencies
Let's start by cloning the project repository you will be working on. We need to make sure that we have installed all the dependencies necessary.
Tip
It's a good practice to create a virtual environment for each project you work. You can do that using venv or conda.
git clone https://github.com/mlops-guide/dvc-gitactions.git
cd dvc-gitactions
pip3 install -r requirements.txt
pre-commit install
Updating the data and checking the project specifications
Your environment is ready! So now let's:
- Download the data
- Check the project pipelines
- Reproduce the main experiment(Optional)
- See current metrics
dvc pull
dvc dag
dvc repro
dvc metrics show
Info
If you are confused with this DVC's commands, check out at the Implementation Guide the Versioning section.
Working on a New Update
Now that you are familiar with the project let's try a new experiment.
Edit the necessary files
First, you should edit those files affected for your experiment.
Here we are changing our model from a Logistic Regression classifier to a Random Forest at model.py
.
pipe = Pipeline(
[
("scaler", StandardScaler()),
- ("LR", LogisticRegression(random_state=0, max_iter=num_estimators)),
+ (
+ "RFC",
+ RandomForestClassifier(
+ criterion="gini",
+ max_depth=10,
+ max_features="auto",
+ n_estimators=num_estimators,
+ ),
+ ),
]
)
Info
If you forgot how this project is organized and what each file is responsible for, go check out Tools and Project Structure.
Create new branch and Reproduce pipelines
Second, let's create a new branch at our repository to version this new experiment. After that, we can reproduce the experiment and see how the new metrics compare to the current model metrics.
git checkout -b RandomForestClassifier
dvc repro
dvc metrics diff
Note
Observe that DVC avoided running the preprocess stage since our model change affected only the train and evaluate stages. To see more about DVC pipelines go check out Working with pipelines.
Even though our experiment didn't improve our current model metrics, we will consider it good for production to demonstrate the rest of the workflow cycle.
Test and Commit
Our experiment is ready, now let's:
- Format our code with black
- Upload to our branch in our Github repository
black .
git add .
dvc push
git commit -m "Random Forest Experiment"
git push origin RandomForestClassifier
Info
The tests and format checking which ran after the commit command were executed by pre-commit
Experiment Deployment
Pull request and automated report
After uploading to our branch, we can now create a Pull Request at the Github Website.
Info
If forgot or want to know more about how this automated report was generated, go check out the Continuous Integration with CML and Github Actions
Release and Watson Deployment
Supposing our experiment was merged to the main branch, we can consider it ready for deployment. To do so, let's release a new version of the project using Github Website.
After releasing the new version, CML and Github Actions will trigger a script responsible for deploying our model to Watson ML.
Info
If forgot or want to know more how this deployment happens, go check out the Continous Delivery with CML, Github Actions and Watson ML
Monitoring
Ending our workflow cycle, we can use IBM OpenScale tool to monitor the model in production.
There we can create monitors for Drift, Accuracy and Fairness. We can also explain the model's predictions, understanding which feature had more weight in the decision and also see what changes would need to be made for the outcome to change.
Info
If forgot or want to know more how to monitor your model in production, go check out the Monitoring with IBM OpenScale