Train & Fine-tune ML Models Using Azure ML SDK (Part 1)

Osama Mosaad
7 min readMar 21, 2021

A series of articles about Azure Machine Learning SDK and how to use the SDK to build an end-to-end ML pipelines.

Graphic Credit: Microsoft for Azure Machine Learning

Introduction

Azure ML provides us with great tools to build, fine-tune and operationalize machine learning models. Azure ML service covers supervised machine learning scenarios either classification or regression problems, in addition to the unsupervised problems such as clustering techniques and dimensionality reduction.

This article is part of a series about Azure ML SDK and how to use it to build an end-to-end machine learning pipelines at scale.

In the first part of this article, I will give an introduction about machine learning concepts and the machine learning pipeline in general, then in the second part I will explain how to use Azure Machine Learning SDK for Python to build and train machine learning models with Azure ML service. Azure ML SDK covers the following key use cases:

  • Train models either locally or using cloud resources such as Azure compute resources.
  • Manage the life cycle of the data used in a machine learning experiment.
  • Operationalize the trained models by easily deploying them as a RESTful services that can be consumed by any client application.
  • Automate the provision and the scalability of the compute resources required for the ML experiment.

General Intuition

The typical ML lifecycle starts usually from an optimization problem that we try to solve with data by choosing the best algorithm to train a model that can provide the best approximation and mapping between the inputs and the outputs.

That is called function approximation, because it realizes or manifests the approximation of unknown target function that can map the inputs to the outputs based on the observations in the given data. The input variables are called the independent predictor variables or features, and the output is the dependent variable or labels. The model that approximates this unknown target function and perform mapping between predictor variables and the output is also called hypothesis.

Hypothesis in simple terms is an explanation of something in a way that can be proved by evidence and can be tested. After training a model and fitting it to the data, we have a trained model that represents the function approximation which explains the underlying relationship between the predictor variables and the dependent variable (the output). That model could be tested in the wild to check if it will generalize well for the unseen data or not. The ability of the model to generalize well for the unseen data is a key factor to test the validity of our hypothesis about the data.

Generic Machine Learning Pipeline

In this section, I will explore the steps of the ML pipeline and provide a brief description for each step. The below diagram illustrates the pipeline steps pretty well.

Generic Machine Learning Pipeline

1- Data Preprocessing

We start with gathering the data in the problem domain and apply the exploratory data analysis (EDA) to understand the data better. No data is perfect, so it also requires preprocessing steps to make the data ready for the further analysis, then modeling. Data might have outliers need to be clipped or trimmed away. It might have missing values or skewness in the distribution of its variables that need to be handled. We might also have redundant or highly correlated features where some of them don’t add any valuable information about the predicted variable, so it will be cleaned up.

2- Building & Training a Model

This is the most well-known part of the story where we fit the model to the training data to get what is so called a trained model that we can save in different formats to serve or generate predictions later.

3- Model Validation and Testing

In this step of the workflow, we validate and test the trained model against the unseen instances of the data and compare the test score with the training score. There are different metrics which could be used to calculate such scores. These metrics vary according to the problem in hand and the nature of the data we deal with. The typical metric will be usually the accuracy for instance (training accuracy versus the test accuracy) to judge the model generalization power against the new data instances (unseen data).

4- Model Optimization & Tuning

Optimizing or tuning a model is the process of finding the best tuning parameters (also called “hyperparameters”) that can maximize the performance of the trained model. You can think of “hyperparamters” as the knobs or the buttons you keep adjusting until your model gets the best outcomes. We might call it also the configurable or adjustable parameters that let you control the model training process. That said, these configuration parameters are chosen before starting the training process. Azure ML SDK includes “HyperDrive” package that helps you automate choosing these hyper-parameters. For example, you can define the parameter search space as discrete or continuous values, and the sampling method over the search space for these parameters. I will come later to such details in the upcoming articles of this series.

5- Model Deployment

Once we have the final model, now it’s time to productionize the model to serve predictions and show the world the power of our great AI models. Deploying the model as a RESTful web service which could be universally consumed from any client application is a great deployment approach. Azure Machine Learning provides a convenient way to deploy models either as a real-time or a batch inferencing service which could be hosted in a containerized platform such Azure Kubernetes Service (AKS) for production scenarios or Azure Container Instances for development and testing scenarios. We will come to this point later with much details and real examples.

6- Model & Data Monitoring

The data can change over time and new trends in the data might evolve. These changes can lead to degradation in the model’s performance. To ensure the model continues to perform and predict accurately, monitoring the deployed model is a key factor to avoid such issues. There is a nice preview feature in Azure ML for monitoring and detecting data drift called “Dataset Monitors”. You can configure the “Dataset monitors” feature to detect any potential data drift between the training dataset and the inference data.

Back to Azure ML

We can use the ML SDK to interact with Azure ML Service in any Python environment, including Jupyter Notebooks, VS Code, PyCharm, or any other Python IDE.

Let’s focus on Jupyter Notebooks in this article as it comes very handy in Azure ML. We will start by creating a new Azure resource called “Machine Learning Workspace”, but what is ML workspace? The ML workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all training runs, including logs, metrics, output, and a snapshot of your scripts. You use this information to determine which training run produces the best model.

Machine Workspace Home

After creating the workspace, you can navigate to the resource from Azure Portal, then click “Launch Studio” button to open the ML workspace. In the left pane of the workspace under “Author” section, you can see “Notebooks” feature where you can create a new Juypter Notebooks hosted in Azure cloud. I created a new notebook to demonstrate a computer vision example where we are going to build a simple convolutional neural network (CNN model) to classify images of digits to its labels. We’re using here the famous MNIST dataset which you can download from here.

Let’s Build Our Digits Classifier in Azure ML

Here is a sample of the MNIST dataset that we will train our model on. It includes 60,000 images for training and 10,000 images for testing.

Sample Images from the training set

You can view the complete code for this CNN Classifier in the GitHub repository in below.

1- Loading required Python libraries

2- Loading the ML workspace

3- Provision Compute Resources to Run ML Experiment

4- Preprocess Dataset

In the below function, we do the following preprocessing steps on the dataset:

  • Casting the pixel values from unsigned integers to float type.
  • Normalize the values to be between 0 and 1 [0, 1].
  • Reshape the dataset to have a single channel as we deal with grey scale images.
  • Finally we apply one-hot encoding on the target labels to convert each label from integer to one-hot encoded vector.

5- Prepare the training script file “digits_classifier_train.py”

6- Create a new Experiment, the Keras environment Object and ScriptRunConfig Object.

The libraries required in the training cluster are defined in the following yaml file. Then we create a new environment object based on this yaml definition as shown in the Python code snippet in below.

channels:- conda-forgedependencies:- python=3.6.2- pip:- azureml-defaults- tensorflow==2.0.0- keras<=2.3.1- matplotlib

7- Run the experiment and Use Azure ML widgets to monitor the progress or the run.

from azureml.widgets import RunDetailsrun = experiment.submit(src)
run.wait_for_completion(show_output=True)
RunDetails(run).show()

Final Output from Azure Widget

Azure Widget Run Details

What Next?

In the next article, we will discuss the hyperparameters tuning and model optimization using HyperDrive library.

Thanks a lot!

--

--

Osama Mosaad

Solution Architect and Senior Software Engineer who has a great passion in machine learning, data science and DevOps.