Last time, we discussed probability theory. We learned that a random event is an event where we know which possible outcomes can occur but we don’t know which specific outcome will occur. We talked about the sample spaces from which these things are drawn and how to model random events using a probability function \(P\).

Now, we’ll push that stuff further introducing the ideas of random variables and their distributions.

Random variables and their distributions

A random variable is simply a random process with a numerical outcome. When we flip a coin and get either a head or a tail, that’s a random event. If we identify a head with a 1 and a tail with a 0, we’ve generated a random variable from that event. The advantage is that we can now perform arithmetic operations and analysis.

We often denote a random variable with the symbol \(X\). Generally, a random variable can take one of many values; but there is a particular probability associated with each value. The list of all these values together with their probabilities is called the distribution of the random variable. The fact that we even can list these out means that we are working with a discrete random variable. We will deal with continuous random variables a little later.

Examples

  • Example 0: Choose \(X\) to be the sum of the scores in the game between Ohio State and Rutgers this weekend
  • Example 1: Flip a coin and write down
    • \(X=1\) if the coin lands heads or
    • \(X=0\) if the coin lands tails
  • Example 2: Roll a standard six sided die and write down the number that comes up.
  • Example 3: Roll a 10 sided die and write down
    • \(X=1\) if the roll comes up 1, 2, or 3,
    • \(X=2\) if the roll comes up 4, 5, 6, or 7, or
    • \(X=3\) if the roll comes up 8, 9, or 10
  • Example 4: Find somebody and choose \(X\) to be their height
  • Example 5: Randomly choose a college and choose \(X\) to be the average salary of all the professors.

An important distinction:

  • Examples 0 through 3 are examples of discrete random variables because they produce only integer values.
  • Examples 4 and 5 are examples of continuous random variables because they can (in principle) produce any real number. We are mostly focused today on the discrete case.

Distributions

The distribution of a discrete random variable is simply a table of all the possible outcomes together with their probabilities. For examples 1, 2, and 3 above, the distributions are as follows:

  • Example 1: \(P(X=1) = P(X=0) = 1/2\)
  • Example 2: \(P(X=1) = P(X=2) = P(X=3) = P(X=4) = P(X=5) = P(X=6) = 1/6\)
  • Example 3:
    • \(P(X=1) = 3/10\)
    • \(P(X=2) = 4/10\)
    • \(P(X=3) = 3/10\)

Notes:

  • We’ve introduced the common notation \[P(X=x_i)=p_i\]
  • There can be any number of outcomes
  • Those outcomes need not be equally likely

Mean or expectation

The expectation of a discrete random variable is \[ E(X) = \sum x_i P(X=x_i) = \sum x_i p_i.\] We might think of this as a weighted mean.

Examples

Weighted die roll: The expectation of our weighted die roll in example 3 above is \[1\frac{3}{10} + 2\frac{4}{10} + 3\frac{3}{10} = 2.\] Weighted coin flip: I’ve got a weighted coin that comes up heads \(75\%\) of the time, in which case I write down a one. If it comes up tails, I write down a zero. The expectation associated with one flip is \[E(X) = 1\times \frac{3}{4} + 0 \times \frac{1}{4} = \frac{3}{4}.\]

Standard deviation

The variance of a discrete random variable \(X\) is \[\sigma^2(X) = \sum (x_i - \mu)^2 p_i.\] We might think of this as a weighted average of the squared difference of the possible values from the mean. The standard deviation is the square root of the variance.

Examples

Weighted die roll: The variance of our weighted die roll in example 3 above is \[(1-2)^2\frac{3}{10} + (2-2)^2\frac{4}{10} + (3-2)^2\frac{3}{10} = \frac{6}{10}.\] Weighted coin flip: The variance of our weighted coin flip is \[\sigma^2(X) = (1-3/4)^2\frac{3}{4} + (0-3/4)^2 \frac{1}{4}=\frac{3}{16}.\]

Combining distributions

One nice thing about expectation and variance is that they are additive. That is, if \(X_1\) and \(X_2\) are both random variables, then \[E(X_1 + X_2) = E(X_1) + E(X_2)\] and \[\sigma^2(X_1 + X_2) = \sigma^2(X_1) + \sigma^2(X_2).\] #### Example

Suppose I flip my weighted coin that comes up heads 75% of the time 100 times and let \(X_i\) denote the value of my \(i^{\text{th}}\) flip. Thus, \[X_1 + X_2 + \cdots X_100\] represents the total number of heads that I get and, by the additivity of expectation, we get \[E(X_1 + X_2 + \cdots X_100) = 100 \times \frac{3}{4} = 75.\] Similarly, for the variance we get \[\sigma^2(X_1 + X_2 + \cdots X_100) = 100 \times \frac{3}{16} = \frac{75}{4}.\] Of course, this means that the standard deviation is \(\sqrt{75/4}\).

Note: The standard deviation of one flip is \(\sqrt{3/16} \approx 0.433013\) and the standard deviation of 100 flips is \(\sqrt{75/4} \approx 4.33013\). The second is 10 times larger in magnitude but, relative to the total number of flips it’s \(0.0433013\), which is tens times smaller.

The binomial distribution

And the emergence of the bell curve

Suppose we flip a coin 5 times and count how many heads we get. This will generate a random number \(X\) between 0 and 5 but they are not all equally likely. The probabilities are:

  • \(P(X=0)=1/32\)
  • \(P(X=1)=5/32\)
  • \(P(X=2)=10/32\)
  • \(P(X=3)=10/32\)
  • \(P(X=4)=5/32\)
  • \(P(X=5)=1/32\)

Note that the probability of getting any particular sequence of 5 heads and tails is \[\frac{1}{2^5} = \frac{1}{32}.\] That explains the denominator of 32 in the list of probabilities. The numerator is the number of ways to get that value for the sum. For example, there are 10 ways to get 2 heads in 5 flips:

If we plot the possible outcomes vs their probabilities, we get something like the following:

Not too exciting just yet but we can start to see the emergence of bell curve. To really see it, we’ll need to up the flip count.



Computations with the binomial distribution

Key formulae

Suppose the probability of a single trial being a success is \(p\). Then the probability of observing exactly \(k\) successes in \(n\) independent trials is given by \[\begin{eqnarray} {n\choose k}p^k(1-p)^{n-k} = \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k} \label{binomialFormula} \end{eqnarray}\] Additionally, the mean, variance, and standard deviation of the number of observed successes are \[\begin{align} \mu &= np &\sigma^2 &= np(1-p) &\sigma &= \sqrt{np(1-p)} \label{binomialStats} \end{align}\]

The first, long and fairly complicated looking line states the formula that is actually used to compute the probability that the number of successes is equal to \(k\). We’ll use the mean and standard deviation computations in the next line later, when we relate the binomial distribution to the normal distribution.

There are four conditions to check to verify that the random variable is binomial:


  1. The trials are independent.
  2. The number of trials, \(n\), is fixed.
  3. Each trial outcome can be classified as a or .
  4. The probability of a success, \(p\), is the same for each trial.

Example

I have an unfair coin that comes up heads 60% of the time. Suppose I flip that coin 10 times, count the number of heads, and call the result \(X\).

  1. What is \(P(X=3)\)?
  2. What is \(P(X<3)\)?

Solution: This is a canonical binomial distribution example. To apply the formula, we simply identify \(p\), \(n\), and \(k\).

For part (a), we have \(p=0.6\), \(n=10\), and \(k=3\). Thus,

\[P(X=7) = {10\choose 3} \times 0.6^3 \times 0.4^7 = \frac{10\times9\times8}{3\times2\times1}\times 0.0003538944 \approx 0.01061683\]

For part (b), we need to add the probabilities for \(X=0\), \(X=1\), and \(X=2\). Thus, we get \[P(X<3) = 0.4^{10} + \left({10\choose 1} \times 0.6 \times 0.4^9\right) + \left({10\choose 2} \times 0.6^2 \times 0.4^8\right) \approx 0.01229455\]