About Order Statistic

06 May, 2024

Given random variable $X ~ f$ , we can get a sample of size $n$ . At times, the cost of sort can be lower than that of calculation, we can get the so called order statistics by sorting the sample in ascending order: $X_{1}, X_{2}, \dots, X_{n} \to X_{(1)}, X_{(2)}, \dots, X_{(n)} .$ By this, we lose all information of the sequence of draw.

Cumulative and Distribution Function

Notice that $X_{(k)} \leq x$ means that at least $k$ observations of the sample is less than $x$ , we have $F_{(k)} (x) = P (X_{(k)} \leq x) = \sum_{j \geq k} (\binom{n}{j}) F (x)^{j} {(1 - F (x))}^{n - j} .$ By differentiating $F_{(k)} (x)$ , we can obtain its density function: $f_{(k)} (x) = \sum_{j = k}^{n} \frac{n!}{j! (n - j)!} (j F (x)^{j - 1} (1 - F (x))^{n - j} f (x) - (n - j) F (x)^{j} (1 - F (x))^{n - j - 1} f (x)),$ which can be simplified into $f_{(k)} = \frac{n!}{(k - 1)! (n - k)!} F (x)^{k - 1} {(1 - F (x))}^{n - k} f (x) .$ Heuristically, this can be understood as selecting $k - 1$ elements and ensure that they are less than $X_{(k)}$ and then select $n - k$ elements and ensure that they are greater than $X_{(k)}$ , and in the end multiply the density of the $k$ -th $X$ .

$X_{(k)}$ as an Estimator

An interesting property of $X_{(k)}$ is that its an unbiased estimator of the $k$ -th $(n + 1)$ -quantile of $f$ . To deal with quantile, a lemma may come handy:

$Y = F (X) ~ U (0, 1)$ for any continuous distribution $X ~ f$ , where $F = \int_{R} f d! x$ .

This is easy to prove: $P (Y \leq y) = P (X \leq F^{- 1} (y)) = F (F^{- 1} (y)) = y,$ thus $Y ~ U (0, 1)$ . We now can proceed to prove that $E (Y_{(k)}) = \frac{k}{n + 1}$ : $E (Y_{(k)}) = \int_{0}^{1} \frac{n!}{(k - 1)! (n - k)!} y^{k - 1} (1 - y)^{n - k} y d! y = \frac{Γ (n + 1)}{Γ (k) Γ (n - k + 1)} B (k + 1, n - k + 1) = \frac{k}{n + 1} .$ Now notice that $Y_{(k)} = F (X_{(k)})$ , $P (X \leq E (X_{(k)})) = F (E (F^{- 1} (Y_{(k)}))) = \frac{k}{n + 1} .$ Admittedly the last step is very clumsy, will need to be fixed.

#"hs-maths"

About Order Statistic

Cumulative and Distribution Function

X(k) as an Estimator

$X_{(k)}$ as an Estimator