Response Time SLA
Percentiles, histograms, and missed SLA counters
Response time (or latency) percentiles are a ubiquitous measure of system performance. The P999 (99.9th percentile) is a high bar, so it’s a good metric to determine an SLA. For example, my team runs a system with an 800 millisecond response time SLA determined by the P999. To meet that SLA, 99.9% of all requests must take <= 800ms. The system meets the SLA, but here’s its P999 graph:
Read more...