Like it!

Join us on Facebook!

Like it!

Cumulative distribution function (CDF)

The probability that a random variable X will be found to have a value less than or equal to x.

Let's begin with the usual random variable §X§ that takes some values at random. Think of those values as the result of an experiment. We have previously seen that a probability density function (PDF) gives the probability that §X§ is between two values, say §a§ and §b§. A cumulative density function (CDF) gives the probability that §X§ is less than or equal to a value, say §x§.

A CDF is usually written as §F(x)§ and can be described as:

§F_X(x) = P(X <= x)§

I like to subscript the X under the function name so that I know what random variable I'm processing. The image below shows a typical cumulative distribution function for a continuous random variable §X§.

Cumulative distribution function (CDF)
1. Typical plot of a cumulative distribution function of a continuous random variable.

Common properties of a CDF

Boundaries, continuity and growth

Any cumulative distribution function is always bounded below by 0, and bounded above by 1, because it does not make sense to have a probability that goes below 0 or above 1. It also has to increase, or at least not decrease as the input §x§ grows, because we are adding up the probabilities for each outcome. The latter property makes the CDF a non-increasing function, or monotonically increasing. Finally a CDF is said to be a continuous function, which roughly means it has no "holes" in the graph.


Sometimes you want to ask the opposite question: how often the random variable is above a particular level? You can tweak the CDF and make it the so-called complementary cumulative distribution function (or CCDF):

§\overlineF_X(x) = P(X > x) = 1 - F_X(x)§

Relationship between CDF and PDF

Actually, cumulative distribution functions are tighty bound to probability distribution functions. The image below shows the relationship between the PDF (upper graph) and a CDF (lower graph) for a continuous random variable with a bell-shaped probability curve.

CDF and PDF relationship
2. Relationship between a PDF (above) and its CDF (below).

A point on the CDF corresponds to the area under the curve of the PDF. For example, I want to know the probability that my random variable §X§ takes on values less than or equal to 0.8: this is the sum of all the probabilities from 0 to 0.8 in the PDF, or the area from 0 to 0.8.

In general if you want to know the probability that §X§ is less than or equal to §a§, in the PDF you are actually asking for §P(0 < X <= a)§, and we know (from the article on Probability Distributions) that

§P(0 <= X <= a) = int_{0}^{a} rho_X(x) dx§

We also know that in a CDF we are summing up all the probabilities from 0 to §a§, and a probability can't be lower than 0. So §P(X <= a) = P(0 <= X <= a)§. The previous equation becomes:

§P(X <= a) = int_{0}^{a} rho_X(x) dx§

From the definition of the CDF we know that

§F_X(a) = P(X <= a)§

so we can conclude that

§int_{0}^{a} rho_X(x) dx = F_X(a)§

Here we go: the CDF is the integral of the PDF. Sometimes the relationship can be written as:

§int_{0}^{x} rho_X(t) dt = F_X(x)§

That's just a more generic version.

CDF: a discrete example

Cumulative distribution functions work also with discrete random variables. In fact the following example deals with the classic toss of a fair 6-sided dice. Of course we have a 1 in 6 chance of getting any of the possible values of the random variable (1, 2, 3, 4, 5 or 6) and the plot below is the CDF of that random variable.

Discrete CDF
3. Discrete cumulative distribution function of a coin toss.

Each step has a height of 0.166666666667: the probability of each single outcome, or 1 in 6. For example, I want to know the probability that the outcome is less than or equal 3:

§F_X(3) = P(X <= 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 0.5§

Or less than or equal 6:

§F_X(6) = P(X <= 6) = 1§

The latter is quite funny: you are asking for the probability of getting any value from the random variable. Something will happen for sure, so the chance of tossing a six sided dice and getting a value between 1 and 6 is 100%.

Sources - Cumulative Distribution Functions (CDF) (link)
math.stackexchange - What is CDF - Cumulative distribution function? (link) - Basic Statistical Background (link)
Wikipedia - Cumulative distribution function (link)

sivaprasad on June 18, 2019 at 13:29
very nice explanation
Martie on July 15, 2019 at 18:55
Your expaination is very thourghout. Thank you
ajipsum on February 19, 2020 at 13:53
"The latter property makes the CDF a *non-increasing function, or monotonically increasing." Did you mean *non-decreasing?
ASharma on May 18, 2020 at 13:27
Cleared many of my doubts. Thanks