Percentile: Difference between revisions
Line 16: | Line 16: | ||
* Forward decay (http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf) | * Forward decay (http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf) | ||
* T-digest (https://github.com/tdunning/t-digest) | * T-digest (https://github.com/tdunning/t-digest) | ||
Latest revision as of 00:36, 1 August 2024
Internal
Overview
The Xth percentile (where X is between 0 and 100) says that in X% of the time, the measured value is below or that amount.
For example, the 95th percentile mens that 95% of the time, the measured value is below or that amount. For the remaining 5% of the time, the usage is above that amount.
The nth percentile, or quantile (ex: 99th, abbreviated P99) is the value at which n% (99%) of the measurements are better and (100-n)% are worse.
Averaging percentiles, by reducing time resolution or combining data from several machines, is mathematically meaningless. The right way to aggregate performance metric data is to add the histograms (see: https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think.
A naive implementation of a percentile computation algorithm is to maintain a list of all performance metric readings for a time window and sort the list periodically. Better algorithms are:
- Forward decay (http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf)
- T-digest (https://github.com/tdunning/t-digest)