Last time, we discussed probability theory. We learned that a random event is an event where we know which possible outcomes can occur but we don’t know which specific outcome will occur. We talked about the sample spaces from which these things are drawn and how to model random events using a probability function \(P\).
Now, we’ll push that stuff further introducing the ideas of random variables and their distributions.
A random variable is simply a random process with a numerical outcome. When we flip a coin and get either a head or a tail, that’s a random event. If we identify a head with a 1 and a tail with a 0, we’ve generated a random variable from that event. The advantage is that we can now perform arithmetic operations and analysis.
We often denote a random variable with the symbol \(X\). Generally, a random variable can take one of many values; but there is a particular probability associated with each value. The list of all these values together with their probabilities is called the distribution of the random variable. The fact that we even can list these out means that we are working with a discrete random variable. We will deal with continuous random variables a little later.
An important distinction:
The distribution of a discrete random variable is simply a table of all the possible outcomes together with their probabilities. For examples 1, 2, and 3 above, the distributions are as follows:
Notes:
The expectation of a discrete random variable is \[ E(X) = \sum x_i P(X=x_i) = \sum x_i p_i.\] We might think of this as a weighted mean.
Weighted die roll: The expectation of our weighted die roll in example 3 above is \[1\frac{3}{10} + 2\frac{4}{10} + 3\frac{3}{10} = 2.\] Weighted coin flip: I’ve got a weighted coin that comes up heads \(75\%\) of the time, in which case I write down a one. If it comes up tails, I write down a zero. The expectation associated with one flip is \[E(X) = 1\times \frac{3}{4} + 0 \times \frac{1}{4} = \frac{3}{4}.\]
The variance of a discrete random variable \(X\) is \[\sigma^2(X) = \sum (x_i - \mu)^2 p_i.\] We might think of this as a weighted average of the squared difference of the possible values from the mean. The standard deviation is the square root of the variance.
Weighted die roll: The variance of our weighted die roll in example 3 above is \[(1-2)^2\frac{3}{10} + (2-2)^2\frac{4}{10} + (3-2)^2\frac{3}{10} = \frac{6}{10}.\] Weighted coin flip: The variance of our weighted coin flip is \[\sigma^2(X) = (1-3/4)^2\frac{3}{4} + (0-3/4)^2 \frac{1}{4}=\frac{3}{16}.\]
One nice thing about expectation and variance is that they are additive. That is, if \(X_1\) and \(X_2\) are both random variables, then \[E(X_1 + X_2) = E(X_1) + E(X_2)\] and \[\sigma^2(X_1 + X_2) = \sigma^2(X_1) + \sigma^2(X_2).\] #### Example
Suppose I flip my weighted coin that comes up heads 75% of the time 100 times and let \(X_i\) denote the value of my \(i^{\text{th}}\) flip. Thus, \[X_1 + X_2 + \cdots X_100\] represents the total number of heads that I get and, by the additivity of expectation, we get \[E(X_1 + X_2 + \cdots X_100) = 100 \times \frac{3}{4} = 75.\] Similarly, for the variance we get \[\sigma^2(X_1 + X_2 + \cdots X_100) = 100 \times \frac{3}{16} = \frac{75}{4}.\] Of course, this means that the standard deviation is \(\sqrt{75/4}\).
Note: The standard deviation of one flip is \(\sqrt{3/16} \approx 0.433013\) and the standard deviation of 100 flips is \(\sqrt{75/4} \approx 4.33013\). The second is 10 times larger in magnitude but, relative to the total number of flips it’s \(0.0433013\), which is tens times smaller.
Suppose we flip a coin 5 times and count how many heads we get. This will generate a random number \(X\) between 0 and 5 but they are not all equally likely. The probabilities are:
Note that the probability of getting any particular sequence of 5 heads and tails is \[\frac{1}{2^5} = \frac{1}{32}.\] That explains the denominator of 32 in the list of probabilities. The numerator is the number of ways to get that value for the sum. For example, there are 10 ways to get 2 heads in 5 flips:
If we plot the possible outcomes vs their probabilities, we get something like the following:
Not too exciting just yet but we can start to see the emergence of bell curve. To really see it, we’ll need to up the flip count.
The first, long and fairly complicated looking line states the formula that is actually used to compute the probability that the number of successes is equal to \(k\). We’ll use the mean and standard deviation computations in the next line later, when we relate the binomial distribution to the normal distribution.
There are four conditions to check to verify that the random variable is binomial:
I have an unfair coin that comes up heads 60% of the time. Suppose I flip that coin 10 times, count the number of heads, and call the result \(X\).
Solution: This is a canonical binomial distribution example. To apply the formula, we simply identify \(p\), \(n\), and \(k\).
For part (a), we have \(p=0.6\), \(n=10\), and \(k=3\). Thus,
\[P(X=7) = {10\choose 3} \times 0.6^3 \times 0.4^7 = \frac{10\times9\times8}{3\times2\times1}\times 0.0003538944 \approx 0.01061683\]
For part (b), we need to add the probabilities for \(X=0\), \(X=1\), and \(X=2\). Thus, we get \[P(X<3) = 0.4^{10} + \left({10\choose 1} \times 0.6 \times 0.4^9\right) + \left({10\choose 2} \times 0.6^2 \times 0.4^8\right) \approx 0.01229455\]