math math h1# Cumulative distribution function (CDF)

The probability that a random variable X will be found to have a value less than or equal to x.

Let's begin with the usual random variable X that takes some values at random. Think of those values as the result of an experiment. We have previously seen that a probability density function (PDF) gives the probability that X is between two values, say a and b. A cumulative density function (CDF) gives the probability that X is less than or equal to a value, say x.

A CDF is usually written as F(x) and can be described as:

F_X(x) = P(X <= x)

I like to subscript the X under the function name so that I know what random variable I'm processing. The image below shows a typical cumulative distribution function for a continuous random variable X.

math 1. Typical plot of a cumulative distribution function of a continuous random variable.

h2## Common properties of a CDF

h3### Boundaries, continuity and growth

Any cumulative distribution function is always bounded below by 0, and bounded above by 1, because it does not make sense to have a probability that goes below 0 or above 1. It also has to increase, or at least not decrease as the input x grows, because we are adding up the probabilities for each outcome. The latter property makes the CDF a non-increasing function, or monotonically increasing. Finally a CDF is said to be a continuous function, which roughly means it has no "holes" in the graph.

h3### The CCDF

Sometimes you want to ask the opposite question: how often the random variable is above a particular level? You can tweak the CDF and make it the so-called complementary cumulative distribution function (or CCDF):

\overlineF_X(x) = P(X > x) = 1 - F_X(x)

h2## Relationship between CDF and PDF

Actually, cumulative distribution functions are tighty bound to probability distribution functions. The image below shows the relationship between the PDF (upper graph) and a CDF (lower graph) for a continuous random variable with a bell-shaped probability curve.

math 2. Relationship between a PDF (above) and its CDF (below).

A point on the CDF corresponds to the area under the curve of the PDF. For example, I want to know the probability that my random variable X takes on values less than or equal to 0.8: this is the sum of all the probabilities from 0 to 0.8 in the PDF, or the area from 0 to 0.8.

In general if you want to know the probability that X is less than or equal to a, in the PDF you are actually asking for P(0 < X <= a), and we know (from the article on Probability Distributions) that

P(0 <= X <= a) = int_{0}^{a} rho_X(x) dx

We also know that in a CDF we are summing up all the probabilities from 0 to a, and a probability can't be lower than 0. So P(X <= a) = P(0 <= X <= a). The previous equation becomes:

P(X <= a) = int_{0}^{a} rho_X(x) dx

From the definition of the CDF we know that

F_X(a) = P(X <= a)

so we can conclude that

int_{0}^{a} rho_X(x) dx = F_X(a)

Here we go: the CDF is the integral of the PDF. Sometimes the relationship can be written as:

int_{0}^{x} rho_X(t) dt = F_X(x)

That's just a more generic version.

h2## CDF: a discrete example

Cumulative distribution functions work also with discrete random variables. In fact the following example deals with the classic toss of a fair 6-sided dice. Of course we have a 1 in 6 chance of getting any of the possible values of the random variable (1, 2, 3, 4, 5 or 6) and the plot below is the CDF of that random variable.

math 3. Discrete cumulative distribution function of a coin toss.

Each step has a height of 0.166666666667: the probability of each single outcome, or 1 in 6. For example, I want to know the probability that the outcome is less than or equal 3:

F_X(3) = P(X <= 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 0.5

Or less than or equal 6:

F_X(6) = P(X <= 6) = 1

The latter is quite funny: you are asking for the probability of getting any value from the random variable. Something will happen for sure, so the chance of tossing a six sided dice and getting a value between 1 and 6 is 100%.

h2## Sources

che.utah.edu - Cumulative Distribution Functions (CDF) (link)
math.stackexchange - What is CDF - Cumulative distribution function? (link)
reliawiki.org - Basic Statistical Background (link)
Wikipedia - Cumulative distribution function (link)