On page 105, random variable \(X\) is defined to be a random process with some numerical outcome.
An important distinction:
On page 82, the distribution of a discrete random variable is defined to be a table of all the possible outcomes together with their probabilities. For examples 1, 2, and 3 above, the distributions are as follows:
Notes:
Rolling two die
Suppose we roll two standard, six-sided die and add the results to get a number \(X\). Then, \(X\) is a number between 2 and 12 but they are not all equally likely. Table 2.5 on page 82 shows us the probability distribution associated with this random process. Those probabilities are
Suppose we flip a coin 5 times and count how many heads we get. This will generate a random number \(X\) between 0 and 5 but they are not all equally likely. The probabilies are:
Note that the probability of getting any particular sequence of 5 heads and tails is \[\frac{1}{2^5} = \frac{1}{32}.\] That explains the denominator of 32 in the list of probabilities. The numerator is the number of ways to get that value for the sum. For example, there are 10 ways to get 2 heads in 5 flips:
If we plot the possible outcomes vs their probabilites, we get something like the following:
Not too exciting just yet but we can start to see the emergence of bell curve. To really see it, we’ll need to up the flip count.
Yesterday, we discussed the binomial distribution from a primarily conceptual point of view. Today, we’ll think a bit more computationally.
The binomial distribution is described in section 3.4.1 of our text and everything is summarized in the box at the top of page 147, which looks something like so:
The first, long and fairly complicated looking line states the formula that is actually used to compute the probability that the number of successes is equal to \(k\). We’ll use the mean and standard deviation computations in the next line later, when we relate the binomial distribution to the normal distribution.
The text also mentions the four conditions to check to verify that the random variable is binomial:
I have an unfair coin that comes up heads 60% of the time. Suppose I flip that coin 10 times, count the number of heads, and call the result \(X\).
Solution: This is a canonical binomial distribution example. To apply the formula, we simply identify \(p\), \(n\), and \(k\).
For part (a), we have \(p=0.6\), \(n=10\), and \(k=3\). Thus,
\[P(X=7) = {10\choose 3} \times 0.6^3 \times 0.4^7 = \frac{10\times9\times8}{3\times2\times1}\times 0.0003538944 \approx 0.01061683\]
For part (b), we need to add the probabilities for \(X=0\), \(X=1\), and \(X=2\). Thus, we get \[P(X<3) = 0.4^{10} + \left({10\choose 1} \times 0.6 \times 0.4^9\right) + \left({10\choose 2} \times 0.6^2 \times 0.4^8\right) \approx 0.01229455\] Computations with R:
While it’s important to understand the forumulae behind these computations, we’ll typically do the computations on the computer. In R, we can use the choose
command to compute \(n\) choose \(k\) or, better yet, we can use the dbinom
command to compute the binomial distribution directly. Thus, we can compute
\[P(X=k) = \texttt{dbinom(k,n,p)}.\] Sometimes, as in part (b) of the problem, we want the probability that a random variable lies in some range - that is, \(P(a < X \leq b)\). We can compute this using the cummulative binomial distribution function pbinom
. That is
\[P(X\leq k) = \sum_{i=0}^k P(X=i)= \texttt{pbinom(k,n,p)}\] and \[P(a<X\leq b) = P(X\leq b) - P(X\leq a) = \texttt{pbinom(b,n,p) - pbinom(a,n,p)}.\] As a special case, we could compute \(P(X>k)\) using \(1-P(X\leq k)\).
Continuing with 10 flips of the unfair coin above that comes up heads 60% of the time, what is the probability that I get more than 7 heads in 10 flips?
Solution: Using the parameters above, we get
\[P(X>7) = 1-P(X\leq 7) = \texttt{1-pbinom(7,10,0.6)} \approx 0.1672898.\]
Here’s problem 3.26 from our text:
Information Center estimates that 90% of Americans have had chickenpox by the time they reach adulthood.
Solution:
Part (a): The binomilar distribution is certainly appropriate. We have 100 trials, each with a probability of 70% independent of one another, and we want the probability of 97 successes (though, that’s an unusual way to think about contracting chicken pox). This is exactly when the binomila distribution works.
Part (b): \[P(X=97) = \texttt{dbinom(97,100,0.7)} \approx 4.117027\times10^{-12}.\] Part (c): \[P(X\neq3) = 1-\texttt{dbinom(3,100,0.7)} \approx 1-1.058683\times10^{-46},\] which is effectively 100%.
Part (d): We now reduce the number of trials to 10 \[P(X=1) = \texttt{dbinom(1,10,0.7)} \approx 0.000137781.\]
Part (e): \[1-P(X\leq3) = 1-\texttt{pbinom(3,10,0.7)} \approx 0.9894079.\]