ML model monitoring

Extend full-stack observability to machine learning with ML model monitoring

Last updated Feb 22, 2024 7 min read

Today, New Relic is extending its observability experience to provide a new offering for artificial intelligence (AI) and machine learning (ML) teams to break down visibility silos. This brand new innovation provides AI/ML and DevOps teams one place to monitor and visualize critical signals like recall, precision, and model accuracy alongside their apps and infrastructure.

Start measuring your ML performance in minutes. In this video, see how to set up New Relic ML model performance monitoring for fast time-to-value of your AI and ML applications:

Why is ML model monitoring important?

Monitoring machine learning (ML) models is crucial for several reasons, and it plays a vital role in the overall lifecycle of a machine learning system. Here are some key reasons why ml model monitoring is important:

Performance tracking:

Concept drift: Real-world data can change over time, and the distribution of data that the model was trained on may no longer represent the distribution of incoming data. This concept drift can lead to a decline in model performance. Monitoring helps identify when such drift occurs and prompts retraining or model updates.
Accuracy and metrics: Monitoring allows tracking key performance metrics such as accuracy, precision, recall, and F1 score. Detecting deviations from expected performance levels helps ensure that the model is delivering accurate predictions.

Data quality assurance:

Input data quality: Monitoring helps ensure that the input data fed into the model remains of high quality. Inconsistent or noisy data can negatively impact model performance.
Label quality: In supervised learning, the quality of labels is crucial. Monitoring can help identify issues with mislabeled data and guide corrective actions.

Compliance and fairness:

Fairness: Monitoring can reveal biases in the model predictions, helping to address fairness concerns. It ensures that the model does not exhibit discriminatory behavior across different demographic groups.
Regulatory compliance: In regulated industries, monitoring helps ensure that the model complies with legal and ethical standards. It assists in maintaining transparency and accountability.

User experience and business impact:

Customer satisfaction: Monitoring allows organizations to track user experience and customer satisfaction by ensuring that the model delivers predictions that meet user expectations.
Business metrics: Monitoring helps tie model performance to key business metrics, ensuring that the ML system contributes positively to the organization's goals.

Bring your ML model data into New Relic

AI/ML engineers and data scientists can now send model performance telemetry data into New Relic and—with integrations to leading machine learning operations (MLOps) platforms—proactively monitor ML model issues in production. You can empower your data teams with full visibility, with custom dashboards and visualizations that can show you the performance of your ML investments in action.

Functional vs operational ML model monitoring

Functional monitoring and operational monitoring are two different aspects of monitoring machine learning (ML) models, each serving a specific purpose in ensuring the effectiveness and reliability of the models. Let's delve into the differences between functional and operational monitoring:

Functional monitoring

Functional monitoring focuses on assessing the performance and accuracy of the machine learning model in terms of its predictive capabilities.

Key metrics

Model metrics: This includes metrics related to the model's performance, such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic (ROC) curve.
Concept drift detection: Functional monitoring helps identify and address concept drift, where the distribution of the input data changes over time, potentially affecting the model's accuracy.

Use cases

Assessing the model's predictive accuracy and generalization to new data.
Detecting and adapting to changes in the underlying data distribution.
Evaluating the model's ability to handle different data scenarios.

Examples: Analyzing classification metrics, assessing model accuracy over time, and detecting shifts in data distribution.

Operational monitoring

Operational monitoring is concerned with the deployment and runtime aspects of ML models, focusing on their interactions with the production environment.

Key metrics

Resource utilization: Monitoring computational resources such as CPU and memory usage to ensure optimal performance.
Response time: Tracking the time it takes for the model to make predictions, ensuring timely responses.
Error rates and failures: Monitoring for errors or failures in the model's predictions and understanding their causes.
Throughput: Assessing the number of predictions the model can handle within a given time frame.

Use cases

Ensuring the model operates efficiently and reliably in a production environment.
Identifying and addressing performance bottlenecks or issues in real time.
Managing computational resources to meet service-level agreements (SLAs).

Examples: Monitoring server performance, tracking response times, and identifying instances of model failures or errors in a production environment.

To sum it up, functional monitoring is concerned with the model's predictive accuracy and its ability to adapt to changing data. In contrast, operational monitoring focuses on the model's real-time performance and interactions within a production environment. Both aspects are crucial for maintaining a robust and reliable machine learning system throughout its lifecycle.

Complete visibility into ML-powered applications

Unlike ordinary software, AI and ML models are based on both code and the underlying data. Because the real world is constantly changing, models developed on static data can become irrelevant or “drift” over time, becoming less accurate. Monitoring the performance of an ML model in production is essential to continue to deliver relevant customer experiences.

By using New Relic for your ML model performance monitoring, your development and data science teams can:

Bring your own ML data or integrate with data science platforms and monitor ML models and interdependencies with the rest of the application components, including infrastructure, to solve problems faster.
Create custom dashboards to gain trust and insights for more accurate ML models.
Apply predictive alerts to ML models from New Relic Alerts and Applied Intelligence to detect unusual changes and unknowns early before they impact customers.
Review ML model telemetry data for critical signals to maintain high-performing models.
Collaborate in a production environment and contextualize alerts, notifications, and incidents before they have an impact on the business.
Access data that allows you to make data-driven decisions, such as boosting innovation, planning decisions, increasing reliability, and enhancing customer experience.

Monitoring is fast emerging as one of the biggest and most important aspects of MLOps and I’m excited to see New Relic launch their AI Observability platform.

As companies expand into more complex use cases for AI/ML, full-stack ML application observability needs to be a key focus for any advanced team—and they need the right tools to keep track of their models as they make key decisions in production. At the AI Infrastructure Alliance, we’re dedicated to bringing together the essential building blocks for the Artificial Intelligence applications of today and tomorrow and we are happy to partner with New Relic on that mission.

Best practices for monitoring ML models in production

Monitoring machine learning models in production is critical for ensuring their ongoing performance, accuracy, and reliability. Here are some best practices for effectively monitoring ML models in a production environment:

Choose appropriate metrics

Select metrics that align with the business objectives and provide a comprehensive view of the model's performance. Consider model-specific metrics, concept drift detection, and fairness metrics in addition to traditional evaluation metrics.

Real-time monitoring

Implement real-time monitoring to detect issues and respond promptly. This is particularly important for applications where timely predictions are crucial, such as fraud detection or recommendation systems.

Monitor data quality

Regularly assess the quality of input data to ensure that the model receives accurate and relevant information. Detect and handle anomalies, outliers, and missing values affecting model performance.

Alerting and notification systems

Set up alerting systems to notify relevant stakeholders when the model's performance deviates from expected standards. Define thresholds for key metrics and establish protocols for addressing issues promptly.

Resource monitoring

Monitor computational resources such as CPU, memory, and GPU usage to ensure optimal performance. Identify and address resource bottlenecks impacting the model's responsiveness and efficiency.

Automated retraining

Establish automated retraining pipelines to periodically update models with fresh data. This helps the model adapt to evolving patterns in the data and maintain its accuracy over time.

Fairness monitoring

Monitor for biases and fairness issues in model predictions, especially in applications with potential ethical implications. Use fairness metrics and tools to identify and mitigate biases.

Get instant value from machine learning model telemetry

With 100GB free per month and ready-made libraries, you can easily bring your own ML model inference and performance data directly from a Jupyter notebook or cloud service into New Relic in minutes to obtain metrics like statistics data and feature and prediction distribution.

In addition, New Relic’s open-source ecosystem offers flexible quickstarts so you can start getting value from your ML model data faster. A wide range of integrations with leading data science platforms like AWS SageMaker, DataRobot (Algorithmia), Aporia, Superwise, Comet, DAGsHub, Mona, and TruEra include pre-configured performance dashboards and other observability building blocks that give you instant visibility into your models. Getting value from your ML model data has never been easier with New Relic.

Inaccurate ML recommendations or predictions can cost a company millions. New Relic Model Performance Monitoring enables teams to measure ML model performance for maximum return on investments.

Get started with machine learning model performance monitoring

We’re committed to making observability a daily best practice for every engineer. With the launch of New Relic ML Model Performance Monitoring, we deliver a unified data observability platform that gives ML/AI and DevOps teams unprecedented visibility into the performance of their ML-based apps. With everything you need in one place, New Relic is expanding observability into the future.

All the available New Relic ML Model Performance Monitoring observability integrations can be found as part of the New Relic Instant Observability ecosystem, with more on the way.

For more information on how to bring your ML model telemetry to New Relic, check out our Python library and notebook example of an XGBoost model, including step-by-step explanation on the integration.

Next steps

For more information on how to set up New Relic ML Model Performance Monitoring or integrate ML model performance in your observability infrastructure, visit the New Relic MLOps docs page.

And if you’re new to New Relic but interested in digging in, experience the simplicity of New Relic yourself by signing up for a forever free account.

By Guy Fighel

Guy Fighel is the General Manager of Applied Intelligence and Group Vice President of product engineering at New Relic. He leads New Relic’s AIOps product and engineering, and is responsible for the company’s overall artificial intelligence and machine learning strategy. Guy was the co-founder and chief technology officer of SignifAI, an event-intelligence company, which was acquired by New Relic in 2019.

The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.

This blog post contains “forward-looking” statements, as that term is defined under the federal securities laws, including but not limited to statements regarding the future of AI/ML and related observability needs. The achievement or success of the matters covered by such forward-looking statements are based on New Relic current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause New Relic actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect New Relic financial and other results and the forward-looking statements in this post is included in the filings New Relic makes with the SEC from time to time, including in the New Relic most recent Form 10-Q, particularly under the captions “Risk Factors” and “Management’s Discussion and Analysis of Financial Condition and Results of Operations.” Copies of these documents may be obtained by visiting the New Relic Investor Relations website at http://ir.newrelic.com or the SEC website at www.sec.gov. New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law.

780+ integrations to start monitoring your stack for free.

See All Integrations