Thus:
Often it is necessary to interpolate between data values to accomplish this, as in the following example.
i x[i]Taking the mean of the values either side of the quartiles is an arbitrary decision: in the example above, the quartiles could be any value in the ranges [105,106], [109,110] and [112, 115].1 102 2 105 ------------- first quartile, Q1 = (105+106)/2 = 105.5 3 106 4 109 ------------- second quartile, Q2 = (109+110)/2 = 109.5 5 110 6 112 ------------- third quartile, Q3 = (112+115)/2 = 113.5 7 115 8 118
If the sample size is not a multiple of four, some of the quartiles may be numbers in the original data set, as in this example:
i x[i]In both of the above cases, the first and third quartiles can be taken to be the median values of the lower and upper halves of the data, respectively. However, there is more than one school of thought on how to apply this definition when the overall median is one of the original data values. The next two examples are illustrations of some of the rules of thumb which have been adopted; neither always produces correct results, and it would be better to use a precise formulation as shown later.1 102 2 105 -- Q[1] = 105 3 106 ------------- Q[2] = 107.5 4 109 5 110 -- Q[3] = 110 6 112
One may include the median in both "halves" of the data:
i x[i]Or not include the median in either "half":1 102 2 105 3 106 -- Q1 = 106 4 109 5 110 )- Q2 = 110 (note line 5 has been duplicated 5 110 to illustrate the point) 6 112 7 115 -- Q3 = 115 8 118 9 120
i x[i]More precise mathematical formulations are possible: the quartiles of the distribution of a random variable X can be defined as the values x such that:1 102 2 105 ------------- Q1 = 105.5 3 106 4 109
5 110 -- Q2 = 110
6 112 7 115 ------------- Q3 = 116.5 8 118 9 120
P(X ≤ 106) = 1/3 and P(X ≥ 106) = 7/9; P(X ≤ 110) = 5/9 and P(X ≥ 110) = 5/9; and P(X ≤ 115) = 7/9 and P(X ≥ 110) = 1/3.See also: Summary statistics, Quantile, Percentile