An archive the questions from Mark's Fall 2018 Stat 225.

Sampling some random heights

Mark

(10 pts)

Here’s a cool way to grab a random sample from our CDC data set:

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(1)
my_sample = df.sample(1000)
my_sample.head()

Using that code, grab a random sample of size 1000 and do the following for the heights:

  1. Generate a histogram
  2. Compute the mean
  3. Compute the standard deviation

Be sure to seed the random number generator using your secret number from class!!

audrey

I grabbed my data like so:

import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
from numpy import random as r
r.seed(14)
audreys_sample = df.sample(1000)

Then I generated a histogram:

%matplotlib inline
heights = audreys_sample['height']
heights.hist(grid=False, edgecolor='black');

audrey_hist

Here’s the mean of my sample:

audreys_sample['height'].mean()
# Out: 67.115
btucker

grabbed data

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(8)
bens_sample = df.sample(1000)
bens_sample.head()

created histogram

%matplotlib inline
heights = bens_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

untitled

calculate mean: 67.016

m = heights.mean()
m

calculate std: 4.0583511694054035

heights.std()
flyassfish

Generated Sample

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(11)
fishs_sample = df.sample(100)
fishs_sample.head()

Generated Histogram

%matplotlib inline
heights = fishs_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

hist

Calculated Mean

m = heights.mean()
m

Which Gave

67.63

Calculated Standard Deviation

std = heights.std()
std

Which Gave

3.9661573393404375
Rebecca

For my data, my random seed was 5, so my initial input was:

import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
from numpy import random as r
r.seed(14)
audreys_sample = df.sample(1000)

And I generated the following histogram.

image
The copy/pasteable code for which is

%matplotlib inline 
heights = my_sample['height']
heights.hist(bins = 20, grid = False, edgecolor='black');

(I cannot run Anaconda on my laptop at this time, or I would redo the image to remove the superfluous code.)

It’s interesting to me that, after some quick googling, I found that the average height of an American woman is around 64" and the average height of an American man is around 69"… roughly equivalent to those huge spikes in my histogram.

To determine the mean heights of my histogram, I entered the code

m=heights.mean()
m

which output

67.226

and to determine the standard deviation, I entered the code

heights.std()

which output

4.1218823265486355

So my sample’s mean height is 67.226 with a standard deviation of approx. 4.12189.

vscala

Data:

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(2)
smpl = df.sample(1000)
smpl.head()

Historgram:

 %matplotlib inline
 heights = smpl['height']
 heights.hist(grid = False, edgecolor='black')

index

Standard Deviation:

heights.std()

4.046771597256303

Mean/Average:

heights.mean()

67.004

dpulse

grabbed data like so:

        import pandas as pd
        from numpy import random as r
        df = pd.read_csv("https://marksmath.org/data/cdc.csv")
        r.seed(4)
        Davids_sample = df.sample(1000)
        Davids_sample.head()

Created histogram like so:

          %matplotlib inline
          heights = Davids_sample['height']
          heights.hist(bins = 20, grid = False, edgecolor = 'black')

computed mean like so:

          m = heights.mean()
          m

Mean = 67.184

computed Standard Deviation like so:

           Davesstd = heights.std()
           Davesstd

Standard Deviation = 4.214

Data:

john

Hello.

import pandas as pd
from numpy import random as r

# get the entire data frame
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')

# only get a random portion of it 
r.seed(10)
my_sample = df.sample(1000)

# looking at the head
my_sample.head()

# generating a histogram of the sample
%matplotlib inline
heights = my_sample['height']
heights.hist(bins = 20, grid=True, edgecolor='green');

# getting the standard deviation
print("Standard Deviation: ")
print(heights.std())

# getting the mean height
print("Mean Height: ")
print(heights.mean())

Output:
Standard Deviation:
4.2491229737495315
Mean Height:
67.253

download%20(1)

goodmorning

Collected data and made the sample size 1000 from my seed 9

import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
from numpy import random as r
r.seed(9)
my_sample = df.sample(1000)
my_sample.head()

then put into a histogram by

%matplotlib inline
heights = my_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

untitled

then using the codes

m = heights.mean()
m

to find the mean of 67.14 and

heights.std()

to find the standard deviation of 4.047070394591473

megan

I grabbed my data like so:

import pandas as pd 
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(12)
my_sample = df.sample(1000)

Then I generated a histogram:

%matplotlib inline
heights = my_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

histogram

Then I generated the mean of 67.226 and standard deviation of 4.2342:

heights.mean()

heights.std()
joshua

I got my code:

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(1)
joshua_sample = df.sample(1000)
joshua_sample.head()

Then I generated my histogram:

%matplotlib inline
heights = joshua_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');
![image|396x246](upload://cpl01Vf3VEKZGgLlSQ9HMCLYS55.png) 

Then I calculated the mean using:

m = heights.mean()
m

And found the mean to be:
67.198

Then I calculated the standard deviation using:

heights.std()

And found the standard deviation to be:
4.141974402619113

mac

I grabbed my data sample with a sample size of 1000. My given seed was 6.

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(6)
my_sample = df.sample(1000)
my_sample.head()

I created my histogram of my_sample

%matplotlib inline
heights = my_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

download

Next I found the mean to be 67.155 useing the code:

m = heights.mean()
m

Then I found the standard deviation to be 4.05948943273098 with the code:

heights.std()
Tripp

I grabbed my data like so.

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(3)
Tripps_sample = df.sample(1000)
Tripps_sample.head()

Then I generated a histogram:

%matplotlib inline
heights = Tripps_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

histogram

I computed Mean and Standard deviation from my data.

m = heights.mean()
m

My_std = heights.std()
My_std 

mean= 67.348
standard deviation=4.170166218064486

dennis

Data grab:

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(13)
dennis_sample = df.sample(1000)
dennis_sample.head()

Histogram:

%matplotlib inline
heights = dennis_sample['height']
heights.hist(grid=False, edgecolor='black');

dennis

Calculate mean:
dennis_sample['height'].mean()
67.091

heights.std()

4.197897252356391

Garrett

Generated Sample:

import pandas as pd
from numpy import random as r
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
r.seed(7)
garretts_sample = df.sample(1000)
garretts_sample.head()

Generated Histogram:

%matplotlib inline
heights = garretts_sample['height']
heights.hist(bins = 20, grid=False, edgecolor='black');

Stat

Calculated Mean:

m = heights.mean()
m

Which Gave:

67.217

Calculated Standard Deviation

heights.std()

Which Gave:

4.132189006566132