Apache Kafka is a distributed streaming platform built for scalability, fault-tolerance, and building real-time data pipelines and streaming applications. Kafka can handle thousands of messages per second with very low latencies, making it a great fit for rapid data processing. But what if there is a major failure in your Kafka system? This can quickly lead to thousands of messages being lost—so it’s vital you have the right monitoring in place.
Retention and replication
See how much data you have stored on disk for each topic partition, and how many copies of that data you have.
When building a high-velocity pipeline, monitor the throughput in the messages-per-second or bytes-per-second that your cluster is processing.
See how far behind your Kafka consumer applications are from your Kafka data producers—and if there's a lag, you can scale out the number of consumers or give them more resources.