Today, New Relic is extending its observability experience to provide a new offering for artificial intelligence (AI) and machine learning (ML) teams to break down visibility silos. This brand new innovation provides AI/ML and DevOps teams one place to monitor and visualize critical signals like recall, precision, and model accuracy alongside their apps and infrastructure. 

Start measuring your ML performance in minutes. In this video, see how to set up New Relic ML model performance monitoring for fast time-to-value of your AI and ML applications:

Why is ML model monitoring important? 

Monitoring machine learning (ML) models is crucial for several reasons, and it plays a vital role in the overall lifecycle of a machine learning system. Here are some key reasons why ml model monitoring is important:

Performance tracking:

  • Concept drift: Real-world data can change over time, and the distribution of data that the model was trained on may no longer represent the distribution of incoming data. This concept drift can lead to a decline in model performance. Monitoring helps identify when such drift occurs and prompts retraining or model updates.
  • Accuracy and metrics: Monitoring allows tracking key performance metrics such as accuracy, precision, recall, and F1 score. Detecting deviations from expected performance levels helps ensure that the model is delivering accurate predictions.

Data quality assurance:

  • Input data quality: Monitoring helps ensure that the input data fed into the model remains of high quality. Inconsistent or noisy data can negatively impact model performance.
  • Label quality: In supervised learning, the quality of labels is crucial. Monitoring can help identify issues with mislabeled data and guide corrective actions.

Compliance and fairness:

  • Fairness: Monitoring can reveal biases in the model predictions, helping to address fairness concerns. It ensures that the model does not exhibit discriminatory behavior across different demographic groups.
  • Regulatory compliance: In regulated industries, monitoring helps ensure that the model complies with legal and ethical standards. It assists in maintaining transparency and accountability.

User experience and business impact:

  • Customer satisfaction: Monitoring allows organizations to track user experience and customer satisfaction by ensuring that the model delivers predictions that meet user expectations.
  • Business metrics: Monitoring helps tie model performance to key business metrics, ensuring that the ML system contributes positively to the organization's goals.

Bring your ML model data into New Relic 

AI/ML engineers and data scientists can now send model performance telemetry data into New Relic and—with integrations to leading machine learning operations (MLOps) platforms—proactively monitor ML model issues in production. You can empower your data teams with full visibility, with custom dashboards and visualizations that can show you the performance of your ML investments in action.

Algorithmia logo

Functional vs operational ML model monitoring

Functional monitoring and operational monitoring are two different aspects of monitoring machine learning (ML) models, each serving a specific purpose in ensuring the effectiveness and reliability of the models. Let's delve into the differences between functional and operational monitoring:

Functional monitoring

Functional monitoring focuses on assessing the performance and accuracy of the machine learning model in terms of its predictive capabilities.

Key metrics

  • Model metrics: This includes metrics related to the model's performance, such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic (ROC) curve.
  • Concept drift detection: Functional monitoring helps identify and address concept drift, where the distribution of the input data changes over time, potentially affecting the model's accuracy.

Use cases

  • Assessing the model's predictive accuracy and generalization to new data.
  • Detecting and adapting to changes in the underlying data distribution.
  • Evaluating the model's ability to handle different data scenarios.

Examples: Analyzing classification metrics, assessing model accuracy over time, and detecting shifts in data distribution.

Operational monitoring

Operational monitoring is concerned with the deployment and runtime aspects of ML models, focusing on their interactions with the production environment.

Key metrics

  • Resource utilization: Monitoring computational resources such as CPU and memory usage to ensure optimal performance.
  • Response time: Tracking the time it takes for the model to make predictions, ensuring timely responses.
  • Error rates and failures: Monitoring for errors or failures in the model's predictions and understanding their causes.
  • Throughput: Assessing the number of predictions the model can handle within a given time frame.

Use cases

  • Ensuring the model operates efficiently and reliably in a production environment.
  • Identifying and addressing performance bottlenecks or issues in real time.
  • Managing computational resources to meet service-level agreements (SLAs).

Examples: Monitoring server performance, tracking response times, and identifying instances of model failures or errors in a production environment.

To sum it up, functional monitoring is concerned with the model's predictive accuracy and its ability to adapt to changing data. In contrast, operational monitoring focuses on the model's real-time performance and interactions within a production environment. Both aspects are crucial for maintaining a robust and reliable machine learning system throughout its lifecycle.

Complete visibility into ML-powered applications

Unlike ordinary software, AI and ML models are based on both code and the underlying data. Because the real world is constantly changing, models developed on static data can become irrelevant or “drift” over time, becoming less accurate. Monitoring the performance of an ML model in production is essential to continue to deliver relevant customer experiences.

By using New Relic for your ML model performance monitoring, your development and data science teams can:

  • Bring your own ML data or integrate with data science platforms and monitor ML models and interdependencies with the rest of the application components, including infrastructure, to solve problems faster.
  • Create custom dashboards to gain trust and insights for more accurate ML models.
  • Apply predictive alerts to ML models from New Relic Alerts and Applied Intelligence to detect unusual changes and unknowns early before they impact customers.
  • Review ML model telemetry data for critical signals to maintain high-performing models.
  • Collaborate in a production environment and contextualize alerts, notifications, and incidents before they have an impact on the business.
  • Access data that allows you to make data-driven decisions, such as boosting innovation, planning decisions, increasing reliability, and enhancing customer experience.

Monitoring is fast emerging as one of the biggest and most important aspects of MLOps and I’m excited to see New Relic launch their AI Observability platform.


As companies expand into more complex use cases for AI/ML, full-stack ML application observability needs to be a key focus for any advanced team—and they need the right tools to keep track of their models as they make key decisions in production. At the AI Infrastructure Alliance, we’re dedicated to bringing together the essential building blocks for the Artificial Intelligence applications of today and tomorrow and we are happy to partner with New Relic on that mission.

Best practices for monitoring ML models in production

Monitoring machine learning models in production is critical for ensuring their ongoing performance, accuracy, and reliability. Here are some best practices for effectively monitoring ML models in a production environment:

Choose appropriate metrics

Select metrics that align with the business objectives and provide a comprehensive view of the model's performance. Consider model-specific metrics, concept drift detection, and fairness metrics in addition to traditional evaluation metrics.

Real-time monitoring

Implement real-time monitoring to detect issues and respond promptly. This is particularly important for applications where timely predictions are crucial, such as fraud detection or recommendation systems.

Monitor data quality

Regularly assess the quality of input data to ensure that the model receives accurate and relevant information. Detect and handle anomalies, outliers, and missing values affecting model performance.

Alerting and notification systems

Set up alerting systems to notify relevant stakeholders when the model's performance deviates from expected standards. Define thresholds for key metrics and establish protocols for addressing issues promptly.

Resource monitoring

Monitor computational resources such as CPU, memory, and GPU usage to ensure optimal performance. Identify and address resource bottlenecks impacting the model's responsiveness and efficiency.

Automated retraining

Establish automated retraining pipelines to periodically update models with fresh data. This helps the model adapt to evolving patterns in the data and maintain its accuracy over time.

Fairness monitoring

Monitor for biases and fairness issues in model predictions, especially in applications with potential ethical implications. Use fairness metrics and tools to identify and mitigate biases.

Get instant value from machine learning model telemetry 

With 100GB free per month and ready-made libraries, you can easily bring your own ML model inference and performance data directly from a Jupyter notebook or cloud service into New Relic in minutes to obtain metrics like statistics data and feature and prediction distribution. 

In addition, New Relic’s open-source ecosystem offers flexible quickstarts so you can start getting value from your ML model data faster. A wide range of integrations with leading data science platforms like AWS SageMaker, DataRobot (Algorithmia), Aporia, Superwise, Comet, DAGsHub, Mona, and TruEra include pre-configured performance dashboards and other observability building blocks that give you instant visibility into your models. Getting value from your ML model data has never been easier with New Relic.  

Inaccurate ML recommendations or predictions can cost a company millions. New Relic Model Performance Monitoring enables teams to measure ML model performance for maximum return on investments.

Get started with machine learning model performance monitoring

We’re committed to making observability a daily best practice for every engineer. With the launch of New Relic ML Model Performance Monitoring, we deliver a unified data observability platform that gives ML/AI and DevOps teams unprecedented visibility into the performance of their ML-based apps. With everything you need in one place, New Relic is expanding observability into the future.

All the available New Relic ML Model Performance Monitoring observability integrations can be found as part of the New Relic Instant Observability ecosystem, with more on the way. 

For more information on how to bring your ML model telemetry to New Relic, check out our Python library and notebook example of an XGBoost model, including step-by-step explanation on the integration.