(10 pts)
I’ve got a CSV file on my web space that contains data on the 2015 Peach Tree Road Race. You can grab it and take a look like so:
df = pd.read_csv('https://www.marksmath.org/data/peach_tree2015.csv')
df.head()
Index | Unnamed | Div Place | Name | Bib | Age | Place | Gender Place | Clock Time | Net Time | Hometown | Gender |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6451 | 1 | SCOTT OVERALL | 72 | 32 | 1 | 1 | 29.500 | 29.500 | SUTTON, UNITED KINGDOM | M |
1 | 6452 | 2 | BEN PAYNE | 74 | 33 | 2 | 2 | 29.517 | 29.517 | COLORADO SPRINGS, CO | M |
2 | 4092 | 1 | GRIFFITH GRAVES | 79 | 25 | 3 | 3 | 29.633 | 29.633 | BLOWING ROCK, NC | M |
3 | 4093 | 2 | SCOTT MACPHERSON | 87 | 28 | 4 | 4 | 29.800 | 29.783 | COLUMBIA, MO | M |
4 | 6453 | 3 | ELKANAH KIBET | 77 | 32 | 5 | 5 | 29.883 | 29.883 | FAYETTEVILLE, NC | M |
Filter this data so that you’ve got a data frame with just the men or women, as you prefer.
Grab a sample of size 100 from this data and store the sample in a variable. When grabbing the random sample, use the sum of the place in the alphabet of the letters in your name as the random_state
. For example, ‘Mark’ yields
After getting the data, run a linear regression to examine the relationship between Age
and Net Time
. Answer the following questions:
- To a 99% level of confidence, is there a genuine relationship between Age and Net Time?
- What Net Time does the model predict for a 54 year old person of the gender in your sample?