Monitoring AI in Production: Introduction to NannyML

Adnan Karol
4 min readFeb 11, 2022


Data Scientists and Data Professionals are increasingly interested in learning how to train machine learning and deep learning models, ranging from simple to complicated. They’re also curious about how to put these models in the cloud.

But one thing that has sparked my curiosity is that little is said about what happens when the models are put in production!!

The Problem To Emphasise

Consider a real-world scenario in which, as a Data Scientist or a Machine Learning Engineer, I released a model into production four years ago that could predict whether or not a person is healthy.

Do you consider this trained model is still a “viable” model today?
Will external elements like covid or a work-from-home situation have an impact on the model?
Yes! It would, and even if it didn’t in this case, there are countless other instances where these types of factors have an impact on the model’s performance over time.

There are two possibilities when it comes to AI model monitoring in production:

  1. Data Drift: In simple terms, it refers to the change in data over the span of time. Obviously, as the data changes, the models must be re-trained, and the metrics must be re-validated.
  2. Concept Drift: This refers to the idea of change in the nature, or statistical properties of the dependent variable (feature). The current model will be useless if the nature of the output variable changes.

The Solution to the Problem

Many companies have developed software that can be used to detect these deviations and even evaluate the performance of production models.

NannyML: Estimate real-world model performance

NannyML – Medium
NannyML (

Introducing to you, NannyML. NannyML is a library that makes Model Monitoring more productive. It estimates the performance of your models in absence of the target, detects data drift, and finds the data drift that’s responsible for any drop in performance.

NannyML focuses on three key points:

  1. Estimate real-world model performance (without access to targets)
  2. Detect multivariate data drift
  3. Link data drift to changes in model performance

Let’s get our hands dirty by learning how to use NannyML.

Practical Dive Down

The first step is to install nannyml package, which is currently available for BETA testing but will be open source and out soon.

  1. Step I: Import Dependencies
# Import Dependencies
pandas as pd
import nannyml as nml

2. Step II: Load Dataset

You can load your custom dataset, but NannyML has some datasets for testing functionality. Let us go ahead and use that dataset.

reference, analysis, output= nml.load_synthetic_sample()
data = pd.concat([reference, analysis], ignore_index=True)

The offered synthetic dataset includes a binary classification model that predicts whether or not an employee will work from home the following weekday.

NannyML uses two partitions of the dataset:

  1. Reference partition: The objective of the reference partition is to set a baseline of expectations for the machine learning model that is being monitored. In the reference partition, the monitored model’s inputs, outputs, and performance results are required.
  2. Analysis Partition: NannyML compares the data drift and performance attributes of the monitored model to the reference partition in the analysis partition. The analysis partition will often contain the most recent production data up to a certain point in the past, which must be after the reference partition finishes. The important thing to remember is that the analysis partition does not contain information about the Target or the Output variable.

So now, we have the dataset partitioned as “reference ”and “analysis”.

3. Step III: Extract Metadata

metadata = nml.extract_metadata(data = reference)
metadata.target_column_name = 'work_home_actual'

4. Step IV: Estimate Performance of the Model without Target

estimator = nml.CBPE(model_metadata=metadata, chunk_size=5000)
estimated_performance = estimator.estimate(data=data)
figure = estimated_performance.plot(kind='performance')
Model Performance

It can be clearly seen that the performance of the model deteriorates after mid-2019.

5. Step V: Detect Data Drift

Let us now try to evaluate, if the feature “workday” can lead to a data drift.

univariate_calculator = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_size=5000)
univariate_results = univariate_calculator.calculate(data=data)
figure = univariate_results.plot(kind='feature_drift', metric='statistic', feature_label="workday")

It is evident that, that the workday feature does not lead to a data drift in the future. Let us now try to evaluate, if the feature “distance_from_office” can lead to a data drift.

figure = univariate_results.plot(kind='feature_drift', metric='statistic', feature_label="distance_from_office")

It is evident that, that the distance from the office can lead to a data drift in the future.

Thus, in this way, the NannML package can be used for monitoring AI in production.



Adnan Karol

Full-Stack Data Scientist