Tokopedia achieves end-to-end observability to maintain e-commerce pole position
Partager ce témoignage
Tokopedia mobile app
- Taille de l’entreprise
- Études de cas présentés
- With New Relic, Tokopedia can correlate performance telemetry, key business metrics, and deployment velocity into one single view
- Deploying New Relic has enabled devs to pinpoint issues quickly and easily and focus on performance engineering
- New Relic enables much more accurate alerting by setting alert intervals to the exact "magic number" required by Tokopedia
Tokopedia is an Indonesian technology company with a mission to democratise and empower commerce through technology for everyone, everywhere. Since its founding in 2009, Tokopedia now reaches more than 99% of districts and empowers more than 11 million merchants across the archipelago. Tokopedia’s vision is to build a Super Ecosystem where anyone can start and discover anything. To achieve that, the company is working hand-in-hand with a range of strategic partners through its marketplace and digital products, fintech and payments, logistics and fulfillment, and new retail businesses. It also provides more than 500,000 payment points across Indonesia and offers 40+ digital products to simplify people’s lives.
The challenge of a fiercely competitive e-commerce market
Indonesians are among the world’s biggest users of digital technology. According to ecommerceDB, the Indonesian e-commerce market is the eleventh largest in the world. Analysis by GlobalData projects that e-commerce sales will grow by 19.2% between 2020 and 2024, reaching US$51 billion in 2024.
This growth has attracted intense competition, with local and global players keen to compete for a slice of the pie.
As such, it’s critical for e-commerce companies to be as competitive as possible, providing reliable, robust services that enable a flawless customer experience. As “Indonesia’s Amazon” - the dominant player in the market - Tokopedia knew it couldn’t rest on its laurels in the face of increasing competition. To remain the biggest, it also needed to be the best.
“Customer experience is critical to succeeding in the digital sphere. But our ecosystem relies on a lot of different partners across payments, logistics, and warehousing. It was very hard to measure whether latency was network-related or internal and even connecting with our partners was quite difficult. We also serve a wide range of devices and it was difficult to figure out what sort of features were not working on some of these,” explains Ryan de Melo, VP of Engineering, Tokopedia.
With its stack growing and becoming increasingly complex, Tokopedia needed better visibility. It wanted to identify and solve technical issues at speed while gaining deeper insights into business metrics. In the past, it had leveraged a number of independent tools. The company wanted to consolidate its monitoring and needed a future-proof solution that could scale with its rapid growth.
Tokopedia evaluated a few partners in terms of capability and commercials, but it was New Relic programmability capabilities that won the deal, de Melo says.
Achieving true observability: the Map of Indonesia
New Relic worked closely with Tokopedia to ensure a successful migration to the new platform. While the migration was still in progress, Tokopedia could already see benefits with deep application visibility, robust telemetry from the frontend, and a unified observability platform.
During the initial phase, New Relic was involved in a service where Tokopedia could see a direct correlation between the status of a transaction and its monetary value (revenue leakage). With New Relic, Tokopedia can now identify several shortfalls in the number of transactions with the corresponding user. This identification couldn't be made easily in the past using tracing with open-source tooling. Tokopedia also took an Observability-first approach for its frontend by building its signature “The Map of Indonesia.” This business dashboard focuses on Tokopedia’s end-users by incorporating custom event data, backend APM and Core Web Vitals mapped to geographical points.
“Thanks to deploying New Relic, Tokopedia could quickly hook up heat map dashboards and instantly see where there was a surge of request tickets happening at any given hour, and plan for any necessary mitigation strategies,” de Melo says.
Happier developers, happier customers
Thanks to New Relic, Tokopedia can correlate performance telemetry, key business metrics, and deployment velocity into one single view. Engineers can set up custom dashboards to quickly focus on performance engineering, analyse the quality of the code in production, and identify hotspots or bottlenecks which are causing issues. The solution also helps the sellers using Tokopedia’s platform.
“When we provided our vendor-customers with a simple New Relic interface, they could actually see the last touchpoint of a customer by searching their mobile number and seeing the last set of errors that customer experienced. This context revealed what the real frustrations were, and whether it was a partner-related problem preventing a transaction going through,” de Melo explains.
In one recent example, the user registration section hit a roadblock and the success rate dropped significantly. However, Tokopedia engineers were able to identify the root cause within minutes – thanks to New Relic.
Tokopedia has also seen an improvement in infrastructure alerts and notifications compared to the existing open source solutions. One example was with Tokopedia’s most important service - Payments.
“There was a case where a previous tool we were using didn’t alert us to any drop (in performance), but the alert in New Relic showed a drop of four minutes. After some investigation, we found out that the drop interval was below the evaluation interval (10 minutes) and it could only provide static intervals (1, 5, 10 minutes, etc). Changing the interval to 1 minute caused too many false-positive alerts and mental fatigue for engineers,” de Melo recalls.
“With New Relic, we can achieve better cardinality by adjusting the interval to three minutes: the ‘magic number’ we need for accurate alerting.”