Why New Relic
Full visibility into an extended application environment
- The New Relic Platform enables Zapier to look at monitoring data from multiple services in a single dashboard
- New Relic drives productivity by diagnosing the root cause of issues in minutes, not hours or days
- New Relic proactively identifies problems long before those problems affect users
Zapier runs 50 Linux servers on the Amazon Elastic Compute Cloud (EC2). The company’s web frontend is a Django application split across several servers, with Amazon Elastic Load Balancing (ELB) between them. The backend consists of a dynamic number of celery task workers fed by messages published to a RabbitMQ cluster. Zapier also maintains a number of internal web services on nginx in front of Gunicorn and Node.js processes. Redis handles simple key and value stores, with logging handled by Graylog2 and ElasticSearch.
When James Carr joined Zapier as a Systems Engineerin January 2013, his first priority was automating and provisioning servers for optimal performance. He knew, however, that no level of automation would be sufficient without an effective monitoring solution in place. “If something’s breaking, I want to know immediately,” he says. “For me, monitoring isn’t really an option. It’s a must have. And if you don’t have the right tool, it can be an incredibly time consuming task. One of my previous employers had a dedicated team of six people working on monitoring. That’s just not practical for us — we only have six people in our entire company. So I started sniffing around for the best option available.”
“We no longer need to run our own custom metric tracking for all of our resources because we can now get all of that information inside the New Relic Platform. We can aggregate analytics in a single app. We don’t waste time looking for data — it’s all in one dashboard. And with each new plugin, we have a new opportunity to save a little more time.”
For Carr, New Relic quickly emerged as the solution to beat. “I tested a few different tools and I was very happy with what I saw in New Relic,” he says. “Best of all, it was easy to implement. I was able to drop it into our environment and just run with it.”
Today, Carr relies on New Relic to monitor CPU utilization. He also makes heavy use of the Python agent to gain greater insight into Zapier’s Django application. But for him, the most useful feature of all is App Map. “It’s almost like the App Map was built for us,” he says. “We integrate with more than 200 external services. We make hundreds of millions of requests to third party APIs every month and we are anticipating that we will be approaching one billion API requests per month before the year is out. With such a complex extended application environment, a graphical representation of our entire ecosystem is just incredibly helpful. By looking at the map in New Relic, we can see which services are taking the longest amount of time and which ones have experienced issues in the last 30 minutes or so. Then we start digging to find out more.”
With the introduction of the New Relic Platform, Carr saw an opportunity to gain even greater visibility into all metrics relevant to Zapier. “Some parts of our environment — like Redis, nginx, RabbitMQ and HAProxy — are major pieces of the business,” he says. “We’re constantly logging into those tools to run a test, then checking New Relic, then going back to run another test. It’s clearly inefficient. I was already trying to figure out how I could create more integration between New Relic and these other services. Then I heard about the platform and it was clear that New Relic was on the same page.”
The New Relic Platform is community-driven, open to any developer who wants to create a plugin either for public consumption or private use. Zapier is already relying on three of the platform’s many extensions:
- The RabbitMQ plugin keeps track of queue backlogs over time, proactively identifying scalability issues by generating alerts when Zapier crosses a given threshold of messages in the queue.
- The Memcached plugin monitors the amount of memory and number of commands executed per second on cache nodes. The Zapier team uses this data to determine how much of its cache is being utilized — even forecasting necessary adjustments to the caching infrastructure.
- The Redis plugin tracks the number of active connections to each Redis instance, the number of commands executed per minute and the total number of keys stored. That way, Zapier can set thresholds for notification when additional resource need to be allocated.
Carr is especially impressed with the plugins created by New Relic customer MeetMe, which include the above-mentioned RabbitMQ and Redis extensions. “I subscribe to the platform developer mailing list and I check out any new plugin related to the services we use,” he says. “Before long, we might even create a plugin to share with the community. When it comes to open source, the possibilities are pretty much endless.”
“New Relic showed us that our response times began skyrocketing in conjunction with a new feature announcement. We identified the endpoint that was causing the issue and provisioned new servers to handle the increased load while we fixed the problem. We were at a coffee shop when all of this happened. The diagnosis took maybe 10 minutes, and I honestly have no idea how long it would’ve taken without New Relic.”
With help from New Relic, Zapier gains insight into problems it wasn’t aware of and receives notifications before those problems affect users. “Shortly after implementing this tool, we identified a number of issues that were minor, but had the potential to become major,” says Carr. “For example, we had complex SQL queries running on tables that didn’t have indexes. New Relic alerted me to the issue by sending a notification to my phone and we addressed the problem before it really became a problem. That kind of proactive insight is invaluable.”
By diagnosing issues quickly and with great precision, New Relic is a major productivity enhancer for the Zapier team. “Just last week, we launched a new Chrome extension and we immediately saw reports of sluggishness on the site,” says Carr. “New Relic showed us that our response times were skyrocketing. We identified the endpoint that was causing the issue and we provisioned new servers to fix the problem. We were at a coffee shop when all of this happened. The diagnosis took maybe 10 minutes and I honestly have no idea how long it would’ve taken without New Relic.”
With the introduction of the New Relic Platform, Carr foresees even greater improvements in productivity. “A good example is the Memcached plugin,” he says. “We no longer need to run a separate script to monitor Memcached, because we can get all of that information inside the New Relic Platform. We can aggregate analytics in a single app. We don’t waste time looking for data — it’s all in one dashboard. And with each new plugin, we have a new opportunity to save a little more time which is great for me and great for Zapier.”