— Written by Triangles on August 21, 2015 • ID 13 —

Variance

Or the variability from an average value.

Variance is a measurement of the spread between numbers in a data set, or elements in a population. It's a numerical value that indicates how widely those elements vary from the mean.

If the variance value is zero, it means that all the values of the data set are identical. A large variance indicates that the values are far from the mean and each other. A small variance means the opposite.

The concept of variance plays an important role, among other fields, in finance. Variance can be seen as a risk indicator of a specific asset: the higher the variance, the greater the investment's volatility, beacuse in such case prices tend to spread across a bigger range.

How to compute the variance of a population

The variance of a population (i.e. the whole data set) is defined as follow:

§ sigma^2 = 1/N sum_{i=1}^{N} (X_i - μ)^2 §

In the formula above σ² (the greek letter sigma) is the population variance (yes, it's a squared value!), μ is the population mean, X_i is the i-th element of the population and N is the number of elements of the population.

In words you have to go through three steps:

compute the mean of all the elements of the population;
for each element of the population (i.e. for each number) subtract the mean and square the result;
sum up all the values from the previous point and divide them by the total number of population.

It's quite important to square the differences during the step 2). If you just added up the differences from the mean, the negatives would cancel the positives (as in -4 and +4). The square tool also provides a bigger value when the differences are more spread out, which is exactly what we need.

How to compute the variance of a sample

Sometimes (and I would say quite often) you don't own the full set of data, but only a part of it - a sample. Computing the variance of a sample requires little tweaks to the original formula:

§ s^2 = {1}/{n-1} sum_{i=1}^{n} (x_i - x̄)^2 §

First of all we changed the notation: s² is the variance of a sample, n is the number of elements in the sample, x_i is the i-th element of the sample, x̄ is the mean of the sample. That's the correct notation when you are dealing with a sample instead of the whole population.

Then we also divide by n - 1: this is the so-called Bessel's correction used to avoid bias. It compensates for the fact that we are working only with a sample rather than with the whole population.

Computing the variance: an example

What follows is a generic set x of numerical values. I pretend it's not a sample, so I compute the variance of the whole population:

§ x = {21, 21, 22, 16, 20} §

Let's calculate the mean first:

§ μ = 1/n sum_{i=1}^{n} x_i
§

§ μ = {21 + 21 + 22 + 16 + 20} / 5 = 20 §

Then process each element as seen in points 2) and 3):

§ sigma^2 = {(21-20)^2 + (21-20)^2 + (22-20)^2 + (16-20)^2 + (20-20)^2} / 5 §

§ sigma^2 = {1 + 1 + 4 + 16 + 0} / 5 = 4.4 §

Do I really have to work with a squared value?

In the variance formula you square your values: such procedure naturally brings you back a squared result. Let's take a look at the previous example: the outcome was 4.4 but the input values (the population) were "plain" numbers, obviously not squared. Hence it seems pointless to make any reasoning on the computed variance, until we don't transform it into something more useful.

The solution is actually simple: just return to the non-squared numbers by taking the square root of the variance, obtaining the so-called standard deviation. I will be writing about that in one of my next posts.

Sources

Statistic Glossary - Basic Definitions (link)
Investopedia - Variance (link)
Investopedia - What is the difference between standard deviation and variance? (link)
Wolfram Mathworld - Variance (link)
Stat Trek - Statistics and Probability Dictionary (link)
Math Is Fun - Standard Deviation and Variance (link)

data • set • variance • population • sample • mean

comments

share!