Percentile: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Internal=
=Internal=
* [[Statistical_Concepts#Concepts|Statistical Concepts]]
* [[Statistical_Concepts#Concepts|Statistical Concepts]]
* [[Performance_Concepts#Response_Time_Percentiles|Performance Concepts]]
=Overview=
=Overview=


Line 6: Line 8:


For example, the 95th percentile mens that 95% of the time, the measured value is below or that amount. For the remaining 5% of the time, the usage is above that amount.
For example, the 95th percentile mens that 95% of the time, the measured value is below or that amount. For the remaining 5% of the time, the usage is above that amount.
The n<sup>th</sup> percentile, or quantile (ex: 99<sup>th</sup>, abbreviated P99) is the value at which n% (99%) of the measurements are better and (100-n)% are worse.
Averaging percentiles, by reducing time resolution or combining data from several machines, is mathematically meaningless. The right way to aggregate performance metric data is to add the histograms <font color=darkkhaki>(see: https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think</font>.
A naive implementation of a percentile computation algorithm is to maintain a list of all performance metric readings for a time window and sort the list periodically. Better algorithms are:
* Forward decay (http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf)
* T-digest (https://github.com/tdunning/t-digest)

Latest revision as of 00:36, 1 August 2024

Internal

Overview

The Xth percentile (where X is between 0 and 100) says that in X% of the time, the measured value is below or that amount.

For example, the 95th percentile mens that 95% of the time, the measured value is below or that amount. For the remaining 5% of the time, the usage is above that amount.

The nth percentile, or quantile (ex: 99th, abbreviated P99) is the value at which n% (99%) of the measurements are better and (100-n)% are worse.

Averaging percentiles, by reducing time resolution or combining data from several machines, is mathematically meaningless. The right way to aggregate performance metric data is to add the histograms (see: https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think.

A naive implementation of a percentile computation algorithm is to maintain a list of all performance metric readings for a time window and sort the list periodically. Better algorithms are: