Random variables and distributions

Random variables

On page 105, the concept of a random variable \(X\) is defined to be a random process with some numerical outcome.

Example 0: Choose \(X\) to be the sum of the scores of the next Ohio State Michigan game
Example 1: Flip a coin and write down
- \(X=1\) if the coin lands heads or
- \(X=0\) if the coin lands tails
Example 2: Roll a standard six sided die and write down the number that comes up.
Example 3: Roll a 10 sided die and write down
- \(X=1\) if the roll comes up 1, 2, or 3,
- \(X=2\) if the roll comes up 4, 5, 6, or 7, or
- \(X=3\) if the roll comes up 8, 9, or 10
Example 4: Find somebody and choose \(X\) to be their height
Example 5: Randomly choose a college and choose \(X\) to be the average salary of all the professors.

An important distinction:

Examples 0 through 3 are examples of discrete random variables because they produce only integer values.
Examples 4 and 5 are examples of continuous random variables because they can (in principle) produce any real number.

Distributions

Roughly, the distribution of a random variable tells you how likely that random variable is to produce certain outputs. The specifics are particular to the distribution in question but divide into two main classes - the discrete distributions and the continuous distributions

Distributions for discrete random variables

On page 82, the distribution of a discrete random variable is defined to be a table of all the possible outcomes together with their probabilities. For examples 1, 2, and 3 above, the distributions are as follows:

Example 1: \(P(X=1) = P(X=0) = 1/2\)
Example 2: \(P(X=1) = P(X=2) = P(X=3) = P(X=4) = P(X=5) = P(X=6) = 1/6\)
Example 3:
- \(P(X=1) = 3/10\)
- \(P(X=2) = 4/10\)
- \(P(X=3) = 3/10\)

Notes:

We’ve introduced the common notation \[P(X=x_i)=p_i\]
There can be any number of outcomes
Those outcomes need not be equally likely
We can visualize a discrete distribution uing a basic plot. Here’s third one:

Often, it makes more sense to do this with a larger distribution. Here’s a uniform distribution on 50 number, i.e. the numbers 1 through 50 are all equally likely.

Distributions for continuous random variables

Section 2.5 discusses the somewhat deeper idea of a distribution for continuous random variable. Rather than assigning a probability to the choice of any single number, we assign probabilities to interval ranges of numbers. Thus, if \(X\) is a continuous random variable, we might write \[P(0<X<5) = 0.2\] or, more generally, \[P(a<X<b) = p.\]

One convenient way to describe this sort of thing is as the area under the graph of a function. As we’ve already seen, this is exactly how the normal distribution works. Here’s a representation of \(P(-0.5<X<2)\) when \(X\) has a standard normal distribution:

Another example is the so-called uniform distribution on an interval which states that the probability of picking a number out of a subinterval is proportional to the length of that subinterval. For example, if our main interval is \[[-1,1] = \{x: -1\leq x \leq 1\},\]

Then, the probability of picking a number out of the left half is \(1/2\); in symbols: \[P(-1<X<0) = 1/2.\]

Note that the uniform distribution is the simplest of all continuous distributions. We will much more often than not be interested in the normal distribution and will read such areas (or probabilities) off of a table.

The binomial distribution

The binomial distribution is a discrete distribution that plays a special role in statistics for many reasons. Importantly for us, the binomial distribution allows us to see how a bell curve (in fact the normal curve) arises as a limit of other types of distributions.

The general idea of the binomial distribution is as follows: Suppose that a single experiment has probability of success \(p\) and probability of failure \(1-p\). We turn this into random variable by assigning numeric values, say success yields a \(1\) and failure yields a \(0\). We then run the experiment a number of times, say \(n\), and count the number of successes. This yields an integer between \(0\) and \(n\) inclusive. The binomial distribution tells us the probability of each of those \(n+1\) outcomes.

Flipping a fair coin

Suppose our experiment is just flipping a coin, that a head represents success, and that a tail represents failure. Thus, with one flip, we can get a \(0\) or a \(1\) with equal probability \(1/2\) each.

Now suppose we flip a coin 5 times and count how many heads we get. This will generate a random number \(X\) between 0 and 5 but they are not all equally likely. The probabilities are:

\(P(X=0)=1/32\)
\(P(X=1)=5/32\)
\(P(X=2)=10/32\)
\(P(X=3)=10/32\)
\(P(X=4)=5/32\)
\(P(X=5)=1/32\)

Note that the probability of getting any particular sequence of 5 heads and tails is \[\frac{1}{2^5} = \frac{1}{32}.\] That explains the denominator of 32 in the list of probabilities. The numerator is the number of ways to get that value for the sum. For example, there are 10 ways to get 2 heads in 5 flips:

If we plot the possible outcomes vs their probabilities, we get something like the following:

Not too exciting just yet but we can start to see the emergence of bell curve. To really see it, we’ll need to up the flip count.

The curves in the above image are just the normal curves with the same mean and standard deviation as the corresponding binomial distribution; the key observation is that the these normal curves approximate their corresponding binomial distribution very well.

An unfair coin

I’ve got a coin that comes up heads \(2/3\) of the time and tails only \(1/3\) of the time; we might call it an unfair coin. Now, suppose I flip that coin \(60\) times and count the number of heads. The probability distribution looks like so:

I suppose it makes sense that we’d expect more heads now. If the probability of getting a head is \(p=2/3\) and we flip the coin 60 times, then I guess we’d expect to get \(60\times2/3 = 40\) heads. To approximate the discrete binomial distribution with a normal distribution, we’ll also need to know the standard deviation.

Mean and standard deviation

The mean and standard deviation that we learned for data can be extended to random variables using the idea of a weighted average. In this context, the mean is often called the expectation.

Mean or expectation

The expectation of a discrete random variable is \[ E(X) = \sum x_i P(X=x_i) = \sum x_i p_i.\] We might think of this as a weighted mean.

Examples

Weighted die roll: The expectation of our weighted die roll in example 3 above is \[1\frac{3}{10} + 2\frac{4}{10} + 3\frac{3}{10} = 2.\] Weighted coin flip: The expectation of our weighted coin that comes up heads \(2/3\) of the time is \[E(X) = 1\times \frac{2}{3} + 0 \times \frac{1}{3} = \frac{2}{3}.\]

Standard deviation

The variance of a discrete random variable \(X\) is \[\sigma^2(X) = \sum (x_i - \mu)^2 p_i.\] We might think of this as a weighted average of the squared difference of the possible values from the mean. The standard deviation is the square root of the variance.

Examples

Weighted die roll: The variance of our weighted die roll in example 3 above is \[(1-2)^2\frac{3}{10} + (2-2)^2\frac{4}{10} + (3-2)^2\frac{3}{10} = \frac{6}{10}.\] Weighted coin flip: The variance of our weighted coin flip is \[\sigma^2(X) = (1-2/3)^2\frac{2}{3} + (0-2/3)^2 \frac{1}{3}=\frac{2}{9}.\]

Combining distributions

One nice thing about expectation and variance is that they are additive. That is, if \(X_1\) and \(X_2\) are both random variables, then \[E(X_1 + X_2) = E(X_1) + E(X_2)\] and \[\sigma^2(X_1 + X_2) = \sigma^2(X_1) + \sigma^2(X_2).\]

Example

Suppose I flip my weighted coin that comes up heads \(3/4\) of the time 60 times and let \(X_i\) denote the value of my \(i^{\text{th}}\) flip. Thus, \[X_1 + X_2 + \cdots X_{60}\] represents the total number of heads that I get and, by the additivity of expectation, we get \[E(X_1 + X_2 + \cdots X_{60}) = 60 \times \frac{2}{3} = 40\] Similarly, for the variance we get \[\sigma^2(X_1 + X_2 + \cdots X_{60}) = 60 \times \frac{2}{9} = \frac{40}{3}.\] Of course, this means that the standard deviation is \(\sqrt{40/3}\).

Note: The standard deviation of one flip is \(\sqrt{2/9} \approx 0.471405\) and the standard deviation of 100 flips is \(\sqrt{60\times2/9} \approx 3.65148\). The second is larger in magnitude but, relative to the total number of flips it’s \(0.0608581\), which is much smaller.

Generalization

Suppose that the probability of success is \(p\); then, the probability of failure is necessarily \(1-p\). The mean and standard deviation for a single experiment are then

\[ \begin{align} \mu &= p &\sigma^2 &= p(1-p) &\sigma &= \sqrt{p(1-p)} \end{align} \]

It follows that the mean and standard deviation for the corresponding binomial distribution are then

\[ \begin{align} \mu &= np &\sigma^2 &= np(1-p) &\sigma &= \sqrt{np(1-p)} \end{align} \]

Handedness

Question: Supposedly, about \(10\%\) people are left handed. Suppose we pick 100 people at random. What is the probabilty that at least 12 of them are left handed?

Solution: We can approximate with a normal distribution with mean \(\mu=100\times0.1 = 10\) and standard deviation \[ \sigma = \sqrt(100\times 0.1 \times 0.9) = 3. \]

The central limit theorem

The central limit theorem is the theoretical explanation of why the normal distribution appears so often in practice. It’s explored in section 4.4 of our text. The statement involves the repetition of a random variable \(X\). More precisely, suppose that \(X\) is a random variable which we evaluate a bunch of times to produce a sequence of numbers: \[X_1, X_2, \ldots, X_n.\] We then compute the average of those values to produce a new value \(\bar{X}\) defined by \[\bar{X} = \frac{X_1 + X_2 + \cdots + X_n}{n}.\] The central limit theorem asserts that the random variable \(\bar{X}\) is normally distributed. Furthermore, if \(X\) has mean \(\mu\) and standard deviation \(\sigma\), then the mean and standard deviation of \(\bar X\) are \(\mu\) and \(\sigma/\sqrt{n}\).

Note that all of this is true regardless of the distribution of \(X\)!

Random variables and distributions

A look at the language of chapters 2 and 3