Conditional Probability

Section 2.4 of our text describes conditional probability. Here's a little complimentary, data oriented material.

Let's start by grabbing our CDC data:

In [1]:
import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
df.head()
Out[1]:
Unnamed: 0 genhlth exerany hlthplan smoke100 height weight wtdesire age gender
0 1 good 0 1 0 70 175 175 77 m
1 2 good 0 1 1 64 125 115 33 f
2 3 good 1 1 1 60 105 105 49 f
3 4 good 1 1 0 66 132 124 42 f
4 5 very good 0 1 0 61 150 130 55 f

Suppose we'd like to examine the relationship between exercise (exerany) and smoking (smoke100). One easy approach is to lump everything together into a contingenct table.

In [2]:
table = pd.crosstab(df.exerany, df.smoke100, normalize=True, margins=True)
table
Out[2]:
smoke100 0 1 All
exerany
0 0.12715 0.12715 0.2543
1 0.40080 0.34490 0.7457
All 0.52795 0.47205 1.0000

From here, it's pretty easy to read off basic probabilities or compute conditional probabilities. For example, the probability that a someone from this sample exercises is 0.7457. The probability that someone from this sample exercises and smokes is 0.34490.

Conditional probabilities can be computed using the formula $$P(A|B) = \frac{P(A\cap B)}{P(B)}.$$ For example, the probability that someone smokes, given that they exercise is $$\frac{0.3449}{0.7457} = 0.462518.$$