현재 이 페이지는 영어로만 제공됩니다.

The New Relic global engineering organization responsible for log management extensively uses their own products to deliver exceptional service to internal and external customers. When it comes to log management, the New Relic engineering organization achieves significant scale, handling tens of petabytes of logs, along with billions of log-focused queries, monthly. This organization operates in a true continuous deployment mode, deploying frequently—often dozens  of times per day—using New Relic for observability. Because of these frequent changes, reliable validation of deployments is paramount, and New Relic provides the necessary insight.

By using New Relic on New Relic and its logging product, we’re achieving significant results:

  • Quality of Service: The primary value is delivering a certain quality of service to customers by helping to ensure the logging product works effectively internally.
  • Proactive Issue Identification: Using New Relic daily allows teams to proactively identify and address issues before they reach production, minimizing customer impact.
  • Safer Releases: The ability to identify and address issues early enables safer releases of new features and updates.
  • Faster Incident Response: As part of the New Relic Emergency Response Force (NERF), the teams rely on New Relic logging product for effective incident response. PagerDuty alerts linked to New Relic alerts provide charts of key metrics and runbook links for quick diagnostic steps and resolution eliminating context switching.

While many metrics are observed, here are some of the most useful to maintain high reliability:

  • Service Level Indicators (SLIs): Top-level SLIs are regularly reviewed for key experiences, such as endpoint latency for log ingestion and compliance across various integrations (for example, AWS Kinesis Firehose, TCP, syslog).
  • Service Level Objectives: There is a high target of availability of the New Relic platform to its customers. This metric reflects New Relics commitment to data integrity and reliability.
  • JavaScript Errors: Monitored by environment, browser, user, and product component to track user experience and identify potential issues.
  • Data Lag:  Monitoring lag increase and decrease is crucial for incident response in particular because New Relic customers depend on high availability of the platform.  

The log management engineering organization uses many New Relic functionalities including:

  • Service Levels, APM, Infrastructure Observability, and Logs:  These core platform capabilities and  insights are used to ensure that key services operate within designated error budgets, and to proactively troubleshoot and resolve issues.
  • Proactive Alerting: On-call engineers rely on alerting  as a crucial component of their incident response, in particular when getting paged for potentially high-severity issues. These alerts link directly to New Relic alerts that provide charts for immediate diagnosis. This integrated alerting process, coupled with established runbooks, significantly cuts down on their response time and helps them proactively identify and address issues.
  • Comprehensive Integrations: Integration with major cloud providers' services, open source tooling, coupled with New Relic agents allows for ingestion of data and correlation of logs across tooling powering comprehensive observability.
New Relic Now 새로운 에이전틱 통합 데모 출시.
지금 시청하세요.