New Relic Now+ New Relic’s most transformative platform update yet with 20+ product launches.
Watch the event on-demand now.

Speed meets precision: Watch Duty's real-time data model in wildfire emergencies

Industry
Location
California
Business Challenge

For fire-affected areas, Watch Duty provides real-time information on danger levels, wind direction, evacuation risk, and emergency response services. Watch Duty is a small organization that requires both urgency and accuracy. To achieve that mission, Watch Duty uses intelligent observability to track whether the platform is working as expected. Watch Duty uses New Relic to see, at a glance, if the application and system is meeting demand, as well as more granular data, such as real-time logs, transactions, and error reports. 

New Relic has supported Watch Duty since its creation, Watch Duty CTO Dave Merrit decided to use a ready-made observability platform—rather than building a home-made monitoring solution with an ELK stack and Kibana—to focus on projects that serve their mission. 

“New Relic is our place to find opportunities and also verify what we expect to see when we have a ton of users accessing our services,” Dave explains. “Our developers can immediately dig into issues as they arise—by seeing the reduction in throughput and response times—before they impact our users.”

“One or two engineers can operate as a full scale tech company with the New Relic tools available.”

Alert thresholds: Separating signal from noise 

One of the first areas that Watch Duty wanted to improve with New Relic was alerting. 

As is common when first setting up alerts, Watch Duty cast a wide net. Any blip outside the normal was reported, but this quickly created too much noise, especially with the level of concurrent users Watch Duty was supporting.

“We started with a small threshold,” says Staff Engineer Josh Frederick. “We need to be able to scale our alerts as we look at them in production. We know that's something we have to create on the fly and modify as a team. This led us to changing a lot of the thresholds to focus on what was really important—and not get annoyed by a ton of alerts. So we set the threshold at a higher level. We had our personal learning experiences where getting thresholds right helped us make sure our graphs looked good,” says Josh. “We had existing data so it was very easy. You could preview what the report would look like based on historical data,” says Josh.

“When an alert is triggered, especially one that was set up awhile ago, it’s like you thank your past self that you had set that up, even if it had been quiet for some time, instead of somebody else telling you about a problem. You don't want the user to tell you when something isn’t working,” says Josh.

In the February 2025 California fires, Watch Duty needed to support 1.5 million concurrent users, an immediate scaling of 10X over last year according to Dave. Watch Duty peaked at up to 9 million users in the week. These users were finding out in real time if the wind was changing direction or whether a spark had grown into another fire. Real time alerts were critical during this time to ensure that systems were working around the clock. Errors were immediately reported via PagerDuty to Slack, which is monitored round-the-clock by a small group of developers.

Watch Duty screenshot

Watch Duty provides real-time wildfire maps and alerts. Users receive notifications about the status and conditions on the ground as they change.

2 views in: Logs and transactions

Josh shares two ways he scans New Relic dashboards to identify areas he may need to focus on to keep systems highly performant. “Logs are where I go if I'm investigating something I already know about. Transactions is where I go if I'm starting with very limited information, and I need to find some breadcrumbs,” says Josh.

“With logs, I usually know the context, so what I'm looking for is very narrow,” says Josh. “If I’m trying to find some insight into an emerging problem, I use transactions. I look at any of the transactions to see if those are aberrant over the time period. If we get an alert in the last 15 minutes, I'll probably look at a four hour window in transactions and see if I see anything. When I identify a spike, I then go to distributed tracing,” says Josh. When introducing a new product feature, transactions is a key approach Josh and the team use. 

With an increasing number of users relying on Watch Duty, push notifications are an essential feature for users. Push notifications need to come from both directions: Reporters on the ground need reassurance realtime updates are getting through. When that data has implications for residents, push notifications need to alert those impacted. Using the transactions view, Josh and his team can see spikes occurring following updates. From there, the team can dig into the distributed tracing logs to identify the causes and address them head on to make push notifications more performant at scale.

1.5M+
concurrent users supported
3M
daily active users supported

Caching for scalability

This year, another feature rollout Dave and Josh oversaw was the move to increased caching. With over 70% of Watch Duty’s traffic coming in and out via APIs, maintaining the health of backend APIs and managing the flow of data is crucial. 

Watch Duty uses Fastly as a content delivery network and increasing their caching, now up to 40% of their traffic, was a necessity to ensure that the 10X growth in concurrent users allowed them all to make queries or request updates at the same time. New Relic’s integrations with Fastly allowed full observability from their introduction so that, as Dave describes, as this feature was implemented they could see it working effectively and could adjust settings when initial teething problems arose.

“What’s more we already had some caching in place, so we were able to use that data to review logs and set thresholds when we increased the frontend caching,” Josh adds. “And we were really able to see that impact dramatically change. That was an easy graph to create and read: we could instantly see in New Relic that we were going from many requests to being able to serve responses from the caching. So we could set thresholds and immediately be able to see our graph lines flatten out to very little. We were adjusting and working on this in real-time and able to see when requests were crashing and fix those and stop them from repeating.”

Confidence during a crisis

By having an observability solution like New Relic in place, the team can keep growing and focus on providing their services at crucial times to people who rely on them for the latest, most up-to-date data on fire risks around their home. 

“It gives us confidence that nothing is going wrong,” says Josh. “During the fires, it was an intense period of user growth and higher usage with it, but we were also confident it was working. So it wasn't even so much that we were looking for issues or spikes, we knew that we would see alerts when they arose and that if there wasn't, we didn’t need to be worried that our logging was wrong.”

Unfortunately, this fire season, Watch Duty had more than enough work to do. From onboarding reporters across fire-stricken areas, registering new users and keeping them informed, adding new data and maps as reports came in and pushing out notifications as needed to alert communities of new dangers: more than ever before, people are relying on Watch Duty’s analysis and communications. 

By relying on New Relic, Watch Duty can keep their developer team improving those crucial services as traffic continues to scale, rather than build observability tooling specifically to monitor their infrastructure and applications. With New Relic’s easy to understand and flexible configurations, Watch Duty can see what is going on in the physical communities they serve, and can put out fires in the digital world where they manage their services.

Get started today for free.