Join us on Facebook!
— Written by Triangles on December 19, 2015 • updated on November 10, 2019 • ID 24 —
The probability that a random variable X will be found to have a value less than or equal to x.
Let's begin with the usual random variable §X§ that takes some values at random. Think of those values as the result of an experiment. We have previously seen that a probability density function (PDF) gives the probability that §X§ is between two values, say §a§ and §b§. A cumulative density function (CDF) gives the probability that §X§ is less than or equal to a value, say §x§.
A CDF is usually written as §F(x)§ and can be described as:
§F_X(x) = P(X <= x)§
I like to subscript the X under the function name so that I know what random variable I'm processing. The image below shows a typical cumulative distribution function for a continuous random variable §X§.
Any cumulative distribution function is always bounded below by 0, and bounded above by 1, because it does not make sense to have a probability that goes below 0 or above 1. It also has to increase, or at least not decrease as the input §x§ grows, because we are adding up the probabilities for each outcome. The latter property makes the CDF a non-increasing function, or monotonically increasing. Finally a CDF is said to be a continuous function, which roughly means it has no "holes" in the graph.
Sometimes you want to ask the opposite question: how often the random variable is above a particular level? You can tweak the CDF and make it the so-called complementary cumulative distribution function (or CCDF):
§\overlineF_X(x) = P(X > x) = 1 - F_X(x)§
Actually, cumulative distribution functions are tighty bound to probability distribution functions. The image below shows the relationship between the PDF (upper graph) and a CDF (lower graph) for a continuous random variable with a bell-shaped probability curve.
A point on the CDF corresponds to the area under the curve of the PDF. For example, I want to know the probability that my random variable §X§ takes on values less than or equal to 0.8: this is the sum of all the probabilities from 0 to 0.8 in the PDF, or the area from 0 to 0.8.
In general if you want to know the probability that §X§ is less than or equal to §a§, in the PDF you are actually asking for §P(0 < X <= a)§, and we know (from the article on Probability Distributions) that
§P(0 <= X <= a) = int_{0}^{a} rho_X(x) dx§
We also know that in a CDF we are summing up all the probabilities from 0 to §a§, and a probability can't be lower than 0. So §P(X <= a) = P(0 <= X <= a)§. The previous equation becomes:
§P(X <= a) = int_{0}^{a} rho_X(x) dx§
From the definition of the CDF we know that
§F_X(a) = P(X <= a)§
so we can conclude that
§int_{0}^{a} rho_X(x) dx = F_X(a)§
Here we go: the CDF is the integral of the PDF. Sometimes the relationship can be written as:
§int_{0}^{x} rho_X(t) dt = F_X(x)§
That's just a more generic version.
Cumulative distribution functions work also with discrete random variables. In fact the following example deals with the classic toss of a fair 6-sided dice. Of course we have a 1 in 6 chance of getting any of the possible values of the random variable (1, 2, 3, 4, 5 or 6) and the plot below is the CDF of that random variable.
Each step has a height of 0.166666666667: the probability of each single outcome, or 1 in 6. For example, I want to know the probability that the outcome is less than or equal 3:
§F_X(3) = P(X <= 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 0.5§
Or less than or equal 6:
§F_X(6) = P(X <= 6) = 1§
The latter is quite funny: you are asking for the probability of getting any value from the random variable. Something will happen for sure, so the chance of tossing a six sided dice and getting a value between 1 and 6 is 100%.
che.utah.edu - Cumulative Distribution Functions (CDF) (link)
math.stackexchange - What is CDF - Cumulative distribution function? (link)
reliawiki.org - Basic Statistical Background (link)
Wikipedia - Cumulative distribution function (link)