AI Monitoring Hero

While AI is propelling modern applications to new heights, it also brings forth unique challenges to engineers that build and run AI-powered applications. Unlike traditional applications, AI applications require a new technology stack that incorporates advanced components like large language models (LLMs) and vector data stores. Additionally, they generate additional telemetry data such as quality and cost that need to be considered to ensure that AI applications are safe, secure, and reliable. Addressing the complexities and optimizing these novel applications are essential for the future of AI, especially in light of the Biden administration's 2023 executive order to establish standards that ensure the safe, reliable, and ethical development and deployment of AI systems.

New Relic AI monitoring is the industry’s first APM solution that provides end-to-end visibility for any AI-powered application. Now available for early access, New Relic AI monitoring provides engineers unprecedented visibility and insights across the entire AI stack so they can build and run safe, secure, and responsible AI applications with confidence.

Before diving into the more technical aspects of New Relic AI monitoring, let's take a look at why AI monitoring is important, and what components of the stack need to be monitored to ensure your AI applications are working properly.

New Relic AI monitoring is the industry’s first APM solution that provides end-to-end visibility for any AI-powered application.

What is AI monitoring?

 

AI monitoring refers to observing, analyzing, and managing artificial intelligence (AI) systems and applications to ensure proper functioning, performance, and compliance with established standards and objectives. AI monitoring typically involves tracking various metrics, such as accuracy, reliability, efficiency, and fairness to assess the AI system's behavior and outcomes.

Why is AI monitoring important?

There are several reasons AI applications need to be monitored:

  • Quality and accuracy: Monitor for bias, toxicity, and hallucinations in complex AI models to ensure fair and reliable outcomes.
  • Performance tuning: Identify and resolve computational bottlenecks to maintain responsive and efficient AI applications.
  • Cost management: Track token processing to manage AI model costs effectively and stay within budgetary limits.
  • Responsible use: Ensure AI responses are free from bias and toxicity that can cause harm.
  • Security: Monitor AI applications for vulnerabilities, taking corrective action to mitigate potential security attacks.

Challenges in AI monitoring and how to overcome them

Some of the most impactful challenges in AI monitoring include:

Data quality and bias: Biases in training data can lead to biased outcomes, affecting the fairness and accuracy of AI systems. Overcoming this challenge involves ensuring representative and diverse training data, regularly auditing data for biases, and employing techniques such as data augmentation and fairness-aware algorithms.

Model drift: As the environment or data distribution changes over time, AI models may become less effective, a phenomenon known as model drift. To address this challenge, continuous monitoring of model performance and data drift is essential, along with periodic model retraining and adaptation strategies.

Interpretability and explainability: Many AI models, particularly deep learning models, are often seen as black boxes, making it difficult to understand their decisions. Overcoming this challenge involves using interpretable models, incorporating explainability techniques such as feature importance analysis and model-agnostic methods like LIME or SHAP, and providing transparency into the decision-making process.

Security and adversarial attacks: AI systems are vulnerable to various security threats, including adversarial attacks where malicious inputs are crafted to deceive the model. To mitigate this challenge, employing robust security measures such as input validation, adversarial training, and monitoring for suspicious activities is crucial.

Resource constraints: Monitoring AI systems in real-time can be resource-intensive, requiring significant computational power and storage. Overcoming this challenge involves optimizing monitoring processes, leveraging scalable infrastructure, and prioritizing critical monitoring tasks.

Regulatory compliance and ethical concerns: Ensuring that AI systems comply with legal regulations and ethical standards poses a significant challenge. Organizations must implement robust governance frameworks, adhere to relevant regulations such as GDPR or HIPAA, and consider ethical implications throughout the AI lifecycle.

In summary, organizations should adopt a comprehensive approach to AI monitoring that incorporates a combination of technical solutions, best practices, and organizational processes. This includes investing in monitoring tools and technologies, establishing clear monitoring objectives and metrics, fostering cross-disciplinary collaboration between data scientists, domain experts, and ethicists, and continuously updating monitoring strategies to adapt to evolving threats and requirements. Additionally, promoting transparency, accountability, and a culture of responsible AI within the organization is essential for effectively addressing these challenges.

New Relic AI monitoring brings the power of observability to engineers working on AI by providing the necessary insights to debug, monitor, and improve AI applications, ensuring that they operate as intended, deliver accurate results, and meet emerging standards for  responsible use.

Decoding the AI stack

AI stacks are complex sets of tools and technologies used to develop and deploy AI applications. As mentioned earlier, AI stacks not only bring a new set of telemetry data, they often require more data, more computing resources, and more specialized tools and technologies than traditional tech stacks. 

Key components of an AI tech stack include:

  • Infrastructure layer: Provides the foundation for AI development and deployment, including powerful GPUs and CPUs to train and deploy AI models along with cloud computing platforms such as AWS, Azure, and Google Cloud Platform (GCP) that provide a scalable way to deploy AI applications.
  • Data storage/vector datastores: AI applications need to store and access large amounts of data. Vector databases are specialized databases that are designed to store and query high-dimensional data that's often used in AI applications. 
  • Model layer: Contains the AI models that are used to make predictions or generate outputs. Some of the popular AI models for content generation include GPT-4, Anthropic, Cohere, LLama 2, and Amazon Bedrock.
  • Orchestration framework: Orchestration frameworks like LangChain provide a way to chain together different components of an AI application, such as data processing, model invocation, and post-processing. 
  • Application layer: Contains the user-facing applications that interact with AI models.

New Relic AI monitoring tool: APM for AI

New Relic AI monitoring brings the power of observability to the entire stack. Similar to how engineers monitor their application stack with New Relic APM, New Relic AI monitoring provides engineers with full visibility into all components of the AI stack so you can easily monitor, debug, and improve your AI applications for performance, quality, cost, and ensure compliance. 

Quick and easy setup

New Relic agents provide quick and easy setup for AI monitoring, with no additional instrumentation required. They provide built-in support for popular models such as OpenAI and AWS Bedrock, as well as orchestration frameworks like LangChain. This gives you complete end-to-end visibility and deep trace insights across your AI stack, enabling you to easily identify and analyze the performance of individual components, trace the flow of data, and pinpoint potential bottlenecks in your AI applications.

Debug faster with complete visibility of your entire AI stack

New Relic AI monitoring integrates seamlessly with New Relic APM 360 to provide end-to-end visibility across your entire AI stack, from the service layer to infrastructure to the AI models. You can now correlate your AI application performance with upstream and downstream trends to understand how issues impact other parts of your application in real time. This eliminates guesswork and makes troubleshooting intuitive and efficient for all engineers. 

Below is a screenshot of the New Relic APM 360 summary with the integrated AI monitoring view. This unified view gives you instant insights into the AI layer’s key metrics, such as the total number of requests, average response time, token usage, user feedback, and response error rates alongside your APM golden signals, infrastructure insights, and logs. Now imagine you see a spike in application errors and also in the AI response errors integrated in the New Relic APM 360 summary view. You can quickly isolate the issue to the AI layer and drill down into the AI responses view to the root cause of the problem.

Optimize AI application performance, quality, and cost with deep insights 

New Relic AI monitoring provides deep traces for every response that gives you the visibility you need to understand how your AI applications are working and make informed decisions about how to fix performance, address quality issues such as bias and toxicity hallucination, and manage costs. With the New Relic AI monitoring response UI you can:

  • Identify outliers and trends: AI monitoring provides a consolidated, roll-up view of all AI responses. This makes it easy to identify outliers and trends in your responses.
  • Trace the entire lifecycle of every response: New Relic AI monitoring allows you to see the entire end-to-end lifecycle of the response. Starting from the prompt through all the stages in the application components with an easily understandable waterfall view as seen in the screenshot below.

Further, you can view the details, including the prompt, negative feedback, and metadata for each response so you can quickly spot and fix issues related to performance or quality.

Compare performance and cost across models

Model comparison is a key part of AI monitoring. It allows you to identify the best model for your needs, track performance over time, and optimize costs. New Relic AI monitoring provides a single, easy-to-use view for troubleshooting, comparing, and optimizing different LLM prompts and responses for performance, cost, and quality issues, such as hallucination, bias, and toxicity across all models. 

Optimizing AI application cost is one common use case for AI monitoring's model comparison. By tracking the token usage across AI models, you can identify which models are the most expensive to run. You can then choose less expensive models to optimize your AI application architecture.

Instantly monitor any AI ecosystem with the largest set of integrations

New Relic AI monitoring provides more than 50+ integrations for the AI ecosystem. This includes popular LLMs, machine learning (ML) libraries, vector databases, as well as frameworks that are not currently supported by New Relic agents. These integrations include pre-configured dashboards, alerts, and other observability building blocks that give you instant visibility into your AI application’s performance and health.

New Relic: Leading the way with AI monitoring

New Relic is leading the way in observability with the introduction of AI monitoring. AI monitoring gives you unprecedented visibility, seamless integration, and deep insights into the entire AI stack. With its integration with New Relic APM 360, you can easily identify performance, cost, and quality issues affecting AI applications. By taking a step into AI observability, AI monitoring empowers organizations to confidently adopt AI in their applications, build trust with customers and partners, and get ahead of regulators in the ever-changing landscape of artificial intelligence.