For my data, my random seed was 5, so my initial input was:
import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
from numpy import random as r
r.seed(14)
audreys_sample = df.sample(1000)
And I generated the following histogram.
The copy/pasteable code for which is
%matplotlib inline
heights = my_sample['height']
heights.hist(bins = 20, grid = False, edgecolor='black');
(I cannot run Anaconda on my laptop at this time, or I would redo the image to remove the superfluous code.)
It’s interesting to me that, after some quick googling, I found that the average height of an American woman is around 64" and the average height of an American man is around 69"… roughly equivalent to those huge spikes in my histogram.
To determine the mean heights of my histogram, I entered the code
m=heights.mean()
m
which output
67.226
and to determine the standard deviation, I entered the code
heights.std()
which output
4.1218823265486355
So my sample’s mean height is 67.226 with a standard deviation of approx. 4.12189.