AxonOps for Kafka Monitoring

Kafka Monitoring & Observability

Unified metrics, logs, and service checks for every Kafka cluster in one platform. Set up monitoring and alerting in seconds, not days.

AxonOps Kafka monitoring dashboard

Unified Kafka observability

Most teams piece together Kafka visibility from multiple tools: broker metrics in Grafana, consumer lag in a separate dashboard, logs in Kibana, and alerts scattered across systems. When something goes wrong, the first task is assembling context rather than fixing the problem. AxonOps brings broker metrics, consumer group lag, logs, service checks, and operational alerts into a single correlated timeline.

Faster incident resolution

Metrics, logs, and consumer lag correlated on one timeline. Go from alert to root cause without switching tools or assembling context manually.

Complete operational coverage

Broker health, KRaft controller state, consumer group lag, connector failures, and service checks all monitored from one platform.

Confidence in cluster health

Continuous monitoring across every dimension with alerts routed to the right people when something needs attention.

The most comprehensive monitoring platform for Apache Kafka

Metrics from brokers, KRaft, Connect, Schema Registry, and consumer groups, with logs, health checks, and alerting in one platform.

Capability What you get Detail
Metrics
Broker metrics Request rates, throughput, ISR shrink/expand, under-replicated partitions, network I/O 5-second collection interval
KRaft controller metrics Controller state, election rates, metadata log lag Full KRaft support
Topic and partition metrics Per-topic throughput, partition count, log size, retention Granular topic-level visibility
Connect worker metrics Task status, throughput, error rates, rebalance counts across all Connect workers Connector-level visibility
Schema Registry metrics Request latency, schema counts, compatibility check rates Registry health at a glance
Consumer group lag Real-time lag per consumer group, topic, and partition Alerting on lag thresholds
JVM internals Heap usage, GC pauses, thread counts, buffer pools Correlated with broker metrics
OS telemetry CPU, memory, disk I/O, network throughput per broker Infrastructure and application in one view
Logs
Kafka logs Broker log ingestion with full-text search Time-aligned with metric anomalies
Log-based charts Create charts from log data and display them alongside metric charts Correlate log patterns with performance trends
Log-based alerting Define alert rules that trigger on log patterns, frequencies, or error rates Catch issues that metrics alone cannot surface
Health checks
Service checks Broker availability, port connectivity, cluster reachability Configurable intervals and escalation
Replica rack placement Monitors replica distribution across racks and alerts when replicas are not spread across all available racks Protects against rack-level failures
User-definable checks Custom health checks tailored to your environment and applications Scriptable checks with configurable thresholds
Alerting & routing
Resource thresholds Disk space, CPU saturation, memory utilization Per-broker alerting with severity levels
Alert routing Route by metric, cluster, or severity PagerDuty, Slack, Teams, email, webhooks
Long-term retention Weeks or months of metric history Capacity planning and SLA reporting
Integrations
PromQL-compatible API Query AxonOps metrics using standard PromQL from Grafana or any compatible tool No data duplication required
Enterprise dashboard integration Expose Kafka metrics to existing Grafana, Datadog, or custom dashboards Fits into your existing observability stack
Alert delivery channels Slack, Microsoft Teams, ServiceNow, PagerDuty, OpsGenie, Generic Webhook, Custom SMTP, Email Route to existing incident workflows
Governance
Role-based access Assign edit or read-only rights per user, with access scoped to specific clusters Prevent unauthorized changes
SSO authentication Optional Enterprise SAML integration for single sign-on Centralized identity management
Audit history Full log of configuration changes, alert rule updates, and dashboard modifications Who changed what and when
Automation
Terraform provider Automate alert rules, notification channels, and adaptive alert routing as code Codify your Kafka observability

Full-stack Kafka Metrics

AxonOps collects metrics at 5-second resolution from every layer of your Kafka deployment: brokers, consumer groups, KRaft controllers, Connect workers, and Schema Registry. This gives operators granular visibility into request rates, throughput, partition health, and consumer lag.

  • 5-second metric resolution across all components, no sampling
  • Pre-built dashboards curated by Kafka engineers, ready to use immediately
  • Real-time consumer group lag monitoring per topic and partition
  • KRaft controller metrics for clusters running without ZooKeeper
  • Infrastructure metrics (CPU, disk I/O, memory, network) correlated alongside Kafka internals
Kafka broker metrics dashboard

Broker Overview

See every broker across all your Kafka clusters in one view. Quickly identify brokers under pressure, check version consistency, and spot uneven partition distribution without switching between clusters or CLI tools.

  • All brokers across all clusters in a single view
  • Partition count, leader count, and replica distribution per broker
  • Broker version and uptime at a glance
  • Drill into any broker for detailed metrics
Kafka brokers overview in AxonOps

Consumer Group Monitoring

Track consumer group lag per topic and partition in real time. Identify slow consumers, stalled partitions, and rebalancing events before they affect downstream systems.

  • Real-time lag per consumer group, topic, and partition
  • Alert on lag thresholds to catch consumers falling behind
  • Track group membership and rebalance events
  • Historical lag trends for capacity planning
Kafka consumer group monitoring in AxonOps

Correlated Log Analytics

Kafka broker logs are ingested and indexed alongside metrics. Move from a throughput drop to the underlying log entries without switching tools. Logs can be charted as time-series data and placed alongside metric charts, and alerts can be configured based on log patterns.

  • Full-text search across Kafka broker and Connect logs
  • Time-aligned correlation between metric anomalies and log events
  • Chart log data as time-series alongside metric dashboards
  • Configure alerts based on log patterns and frequency
  • Filterable by broker, cluster, log level, and custom patterns
Kafka correlated log analytics

Alert Routing

Route alerts to the right team based on what triggered them. Infrastructure teams receive broker health alerts, application teams receive consumer lag alerts, and platform teams receive connector failure notifications.

  • Route by metric type, cluster, severity, or custom labels
  • Deliver to PagerDuty, Slack, Microsoft Teams, email, or webhooks
  • Full alert history for incident review and post-mortems
Kafka alert routing configuration

Service Health Checks

Continuous health checks validate that brokers are reachable, ports are open, and the cluster is responding to requests. Combined with resource threshold monitoring, you get early warning before performance degrades.

  • Broker availability and port connectivity checks
  • Disk space, CPU, and memory threshold alerts per broker
  • User-definable checks for custom validation logic
Kafka service health checks

PromQL Queries & Custom Dashboards

Build custom dashboards and charts using a PromQL-compatible query language. Query any metric collected by AxonOps, create ad-hoc visualisations for troubleshooting, and build team-specific dashboards tailored to your operational workflows.

  • PromQL-compatible query language for all collected metrics
  • Build custom charts and dashboards alongside pre-built views
  • Ad-hoc queries for live troubleshooting and investigation
  • Expose metrics to external dashboards like Grafana via the PromQL-compatible API
PromQL metrics query in AxonOps

See Kafka monitoring in action

AxonOps Kafka dashboard