Section 2.4 of our text describes conditional probability. Here's a little complimentary, data oriented material.
Let's start by grabbing our CDC data:
import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
df.head()
Suppose we'd like to examine the relationship between exercise (exerany
) and smoking (smoke100
). One easy approach is to lump everything together into a contingenct table.
table = pd.crosstab(df.exerany, df.smoke100, normalize=True, margins=True)
table
From here, it's pretty easy to read off basic probabilities or compute conditional probabilities. For example, the probability that a someone from this sample exercises is 0.7457. The probability that someone from this sample exercises and smokes is 0.34490.
Conditional probabilities can be computed using the formula $$P(A|B) = \frac{P(A\cap B)}{P(B)}.$$ For example, the probability that someone smokes, given that they exercise is $$\frac{0.3449}{0.7457} = 0.462518.$$