← Back to portfolio 2025-02-26

Monitoring Spring Boot Services with Prometheus and Grafana

PrometheusGrafanaSpring BootObservabilityMicrometer

Every Spring Boot service should expose Prometheus metrics from day one. Here is a practical setup, the dashboards that actually get used, and the alerts that actually matter.

The Setup

Add two dependencies: micrometer-registry-prometheus and spring-boot-starter-actuator. Configure the actuator to expose the Prometheus endpoint:

management:
  endpoints:
    web:
      exposure:
        include: health,prometheus
  metrics:
    tags:
      application: ${spring.application.name}

That is it. The service now exposes /actuator/prometheus with JVM metrics, HTTP request metrics, and connection pool metrics.

The Dashboards That Matter

Three Grafana dashboards per service cover most needs:

Service health: Request rate, error rate, p50/p99 latency, JVM heap usage. This is the on-call dashboard for incident response.
JVM deep dive: GC pause times, thread counts, metaspace usage, direct buffer usage. This is the performance tuning dashboard.
Business metrics: Custom counters for domain events (orders processed, payments completed). This is the product dashboard.

The Alerts That Matter

Keeping the alert count small per service is critical. More alerts create noise; too few miss real issues. A reasonable set:

Error rate exceeds threshold for sustained period. Something is broken.
P99 latency exceeds multiple of normal for sustained period. Something is slow.
JVM heap consistently high. Memory leak or undersized.
Pod restart count elevated in short window. Crash loop.

Every alert should have a runbook link in the annotation. If an alert fires and the on-call engineer does not know what to do, the alert is useless.