math math h1# Random variables

How do they work, the differences between discrete and continuous ones, how to use them in probability.

A random variable is the set of possible values from a random phenomenon. When you observe a random process like throwing a die you can store the class of outcomes of that process into a random variable. If you set up a random variable you just map those outcomes to simple numbers. Random variables are generally written in capital letters and are also known as the sample space of the random phenomenon.

Other common names are aleatory variable, from aleator in latin (the dice player), denoting something that depends on the throw of a dice or on chance. Or yet stochastic variable, from the greek word Στοχοσ (stokhos) (aim, target).

h2## The classical coin toss example

Let's tackle a classical example: I'm tossing a coin an indefinite amount of times. There are only two possible outcomes: heads or tails, so I map heads to 0 and tails to 1. I could map them to any other number like 30430 and 80000, but zeroes and ones are more pratical. Finally I define the random variable X as:

X = {(0, "heads") , (1, "tails"):}

That's it. I have just numerically defined the sample space of my coin toss experiment. Mapping outcomes to numbers is useful because you can do mathematical operations and reasonings with the random variables. Also several notations come up:

• X (capital letter) — the random variable;
• x (lower-case letter) — the value of the random variable;
• P(X = x) = p — the probability p that the random variable X is equal to a particular value x.

For example you can now do useful things like P(X = x) + P(X = y) and many other calculations based on probability (which I will dig deeper in the next articles).

h2## Discrete random variables vs continuous random variables

When the sample space is limited to a fixed amount of outcomes we can say that the random variable is a discrete random variable, because it can only take on a finite number of values. Say you pick a random pixel from a grey-scale 16-bit image: there are only 2^16 = 65536 possibile values (i.e. shades of grey) in the sample space, that is X = {0, 1, ..., 65535}. No middle values like 45.7 or similar.

On the other hand we deal with continuous random variables, which can take on any value in a certain range. Say you define X as the frequency of a random note played from 27.7 Hz to 4186 Hz: there are infinite, non-countable values between that two frequencies.