Response Time SLA

Percentiles, histograms, and missed SLA counters

Response time (or latency) percentiles are a ubiquitous measure of system performance. The P999 (99.9th percentile) is a high bar, so it’s a good metric to determine an SLA. For example, my team runs a system with an 800 millisecond response time SLA determined by the P999. To meet that SLA, 99.9% of all requests must take <= 800ms. The system meets the SLA, but here’s its P999 graph: Read more...