If you're already using or plan to use OpenAI's GPT large language models like GPT-3 and GPT-4 at scale, it's important to monitor metrics like average request time, total requests, and total cost. Doing so can help you ensure that OpenAI GPT Series APIs like ChatGPT are working as expected, especially when those services are required for important functions like customer service and support.  

In this post, you’ll learn how you can easily set up the New Relic OpenAI integration, and:

  • Monitor OpenAI usage and track costs.
  • Analyze and optimize model performance.
  • Understand user engagement.

Start monitoring OpenAI in minutes

New Relic is focused on delivering valuable AI and machine learning (ML) tools that provide in-depth monitoring insights and integrate with your current technology stack. Our industry-first MLOps integration with OpenAI’s GPT-3, GPT-3.5, and GPT-4 provides a seamless path for monitoring this service. Our lightweight library helps you monitor OpenAI completion queries and simultaneously records useful statistics around ChatGPT in a New Relic dashboard about your requests.

In this video, see how you can start monitoring OpenAI with just two lines of code and get a pre-built dashboard to help you achieve better GPT model performance.

With just two lines of code, simply import the monitor module from the nr_openai_monitor library and automatically generate a dashboard that displays a variety of key GPT performance metrics such as cost, requests, average response time, and average tokens per request.

To get started, install the OpenAI Observability quickstart from New Relic Instant Observability. Watch the Data Bytes video or read our OpenAI integration documentation for details on how to integrate New Relic with your GPT apps and deploy the custom dashboard.

Get the pre-built OpenAI monitoring dashboard by installing the quickstart from New Relic Instant Observability.

Track and allocate costs based on token usage

Usage of the OpenAI API is charged based on token consumption, with varying costs across different models. Using OpenAI’s most powerful Davinci model costs $0.12 per 1000 tokens (equivalent to approximately 750 words), which can add up quickly and make it difficult to operate at scale.

Naturally, one of the most valuable metrics you’ll want to monitor is the cost of operating ChatGPT. The New Relic integration for OpenAI provides real-time cost tracking and token consumption. The pre-built dashboard surfaces the financial implications of your OpenAI usage and helps you determine more efficient use cases.

Monitor and allocate costs based on token usage in the OpenAI Observability quickstart dashboard.

Analyze and optimize model performance

The speed of your ChatGPT, Whisper API, and other GPT requests can help you improve your models and quickly deliver the value behind your OpenAI applications to your customers. Our integration helps you monitor the performance of the OpenAI API by showing average response times. With the pre-built dashboard, you can analyze the aggregate response time metric over time, or see a breakdown of response times by model. With visibility into your model performance, you can understand your usage, troubleshoot faster, and improve the efficiency of ML models.

With the OpenAI Observability quickstart dashboard, you can monitor the average response time of your API and see how each model is performing over time.

Understand user engagement

Other metrics included on the New Relic dashboard are total requests, average token/request, model names, and samples. These metrics provide valuable information about the usage and effectiveness of ChatGPT and OpenAI, and they can help you enhance performance around your GPT use cases.

With this integration, you can even gain visibility into user prompts and completions. This allows you to better understand user engagement with your applications to optimize your parameters and settings.

See prompts and completions to better understand user engagement with your applications.

Overall, our OpenAI integration is fast, easy to use, and will get you access to real-time metrics that can help you optimize your usage, enhance ML models, reduce costs, and achieve better performance with your GPT models.