Monitoring AI in Production: Introduction to NannyML

The Problem To Emphasise

  1. Data Drift: In simple terms, it refers to the change in data over the span of time. Obviously, as the data changes, the models must be re-trained, and the metrics must be re-validated.
  2. Concept Drift: This refers to the idea of change in the nature, or statistical properties of the dependent variable (feature). The current model will be useless if the nature of the output variable changes.

The Solution to the Problem

NannyML: Estimate real-world model performance

NannyML – Medium
NannyML (https://www.nannyml.com/)
  1. Estimate real-world model performance (without access to targets)
  2. Detect multivariate data drift
  3. Link data drift to changes in model performance

Practical Dive Down

  1. Step I: Import Dependencies
# Import Dependencies
import
pandas as pd
import nannyml as nml
reference, analysis, output= nml.load_synthetic_sample()
data = pd.concat([reference, analysis], ignore_index=True)
print(data.shape)
print(data.head())
  1. Reference partition: The objective of the reference partition is to set a baseline of expectations for the machine learning model that is being monitored. In the reference partition, the monitored model’s inputs, outputs, and performance results are required.
  2. Analysis Partition: NannyML compares the data drift and performance attributes of the monitored model to the reference partition in the analysis partition. The analysis partition will often contain the most recent production data up to a certain point in the past, which must be after the reference partition finishes. The important thing to remember is that the analysis partition does not contain information about the Target or the Output variable.
metadata = nml.extract_metadata(data = reference)
metadata.target_column_name = 'work_home_actual'
estimator = nml.CBPE(model_metadata=metadata, chunk_size=5000)
estimator.fit(reference)
estimated_performance = estimator.estimate(data=data)
figure = estimated_performance.plot(kind='performance')
figure.show()
Model Performance
univariate_calculator = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_size=5000)
univariate_calculator.fit(reference_data=reference)
univariate_results = univariate_calculator.calculate(data=data)
figure = univariate_results.plot(kind='feature_drift', metric='statistic', feature_label="workday")
figure.show()
figure = univariate_results.plot(kind='feature_drift', metric='statistic', feature_label="distance_from_office")
figure.show()

Thus, in this way, the NannML package can be used for monitoring AI in production.

--

--

--

Full-Stack Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Unsupervised Programming Language Models?!

Convolutional Neural Network using Keras

How To Install Anaconda, Jupyter Notebook & Tensorflow To Start With Deep Learning in 10 mins

What exactly is machine learning and how machine learning is transforming our lives?

Google Cloud API Marks A Bold Move With user Sentiment Analysis

Generating Heuristic-Driven Data for Neural Network Training (Part 1)

Deep Neural Networks As Computational Graphs

[Episode 1] The Quantum Mechanics of Language

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adnan Karol

Adnan Karol

Full-Stack Data Scientist

More from Medium

How to Organize Data Even Clueless Machines Will Understand | Dataloop

Mr. Wolf Fools the Data Science Team Again — Data Leakage Scam 🐺

Vector autoregression forecast on chemical data of The Antwerp Maritime Academy

drawing

The Elbow Method is Wrong