Conditional Probability¶

Section 2.4 of our text describes conditional probability. Here's a little complimentary, data oriented material.

Let's start by grabbing our CDC data:

import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
df.head()

Suppose we'd like to examine the relationship between exercise (exerany) and smoking (smoke100). One easy approach is to lump everything together into a contingenct table.

table = pd.crosstab(df.exerany, df.smoke100, normalize=True, margins=True)
table

From here, it's pretty easy to read off basic probabilities or compute conditional probabilities. For example, the probability that a someone from this sample exercises is 0.7457. The probability that someone from this sample exercises and smokes is 0.34490.

Conditional probabilities can be computed using the formula $$P(A|B) = \frac{P(A\cap B)}{P(B)}.$$ For example, the probability that someone smokes, given that they exercise is $$\frac{0.3449}{0.7457} = 0.462518.$$

	Unnamed: 0	genhlth	exerany	hlthplan	smoke100	height	weight	wtdesire	age	gender
0	1	good	0	1	0	70	175	175	77	m
1	2	good	0	1	1	64	125	115	33	f
2	3	good	1	1	1	60	105	105	49	f
3	4	good	1	1	0	66	132	124	42	f
4	5	very good	0	1	0	61	150	130	55	f

smoke100	0	1	All
exerany
0	0.12715	0.12715	0.2543
1	0.40080	0.34490	0.7457
All	0.52795	0.47205	1.0000