An archive the questions from Mark's Summer 2018 Stat 185.

Random CDC like data

mark

(10 pts)

I’ve got a random data generator on my webserver. You can download data directly into R and view the first couple of rows like so:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mark')
head(mydf,2)

#    first_name last_name age gender height weight income smoke100 exerany
#  1      Donna     Dinan  35 female  65.37 164.26   1947        0       1
#  2      Ramon     Davis  26   male  71.81 193.70  39311        1       1

Note that the username field must match your forum username and everyone gets different data.

The problem: Do the following using your data:

Generate a histogram of the heights
generate a contingency table relating gender and exercise. Does there appear to be a relationship? If so, what is that relationship?

Be sure to include the code you typed to get your answer. Code blocks are created by indenting your input four spaces.

Henry

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=Henry')
head(mydf,2)
  first_name last_name age gender height weight income smoke100 exerany handedness
1      Sunny    Gaiser  31 female  64.01 182.12   1825        N       N          R
2      Roger      Dunn  53   male  67.85 153.32  53977        Y       Y          R
hist(mydf$height)
table(mydf$gender,mydf$exerany)
          N  Y
  female 22 41
  male   10 27

Based on the contingency table, males are slightly more likely to exercise than females. 73% of male respondents reported that they exercised while only 65% of females reported that they exercised.

jthomps6

Here are my results and code from the following questions stated above.

**Code:**
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=jthomps6')
head(mydf,2)

hist(mydf$height)
table(mydf$gender, mydf$exercise)

**Results/Answers:**
first_name last_name age gender height weight income smoke100 exerany handedness
1     Ronald    Fields  40   male  66.66 164.40  18571        N       N          R
2     Audrey     Bower  35 female  62.32 133.95   6828        N       N          R

Contingency Table of Gender And Exercise

       N  Y
female 12 45
male   11 32

Females seemed to exercise more than males.

audrey

OK, here’s my attempt. First, I’ll read in the data:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=audrey')
head(mydf, 2)

#Output:
 first_name last_name age gender height weight income smoke100 exerany handedness
 1     George  Howerton  51   male  70.98 222.96 216670        Y       Y          R
 2        Rae    Cherry  53 female  64.33 127.77   9488        Y       Y          R

The histogram

Here’s how I generated the histogram for the heights. I used a couple of options to make the picture a little prettier.

hist(mydf$height, xlab='Heights', main='', col='gray')

heigts_hist

The contingency table

Here’s the contingency table:

table(mydf$gender, mydf$exerany)/100
    
        N    Y
female 0.16 0.29
male   0.14 0.41

Looks pretty close percentage-wise; perhaps a mosaic plot would help?

mosaicplot(table(mydf$gender, mydf$exerany), main='')

mosaic

From this sample, it looks like a slightly higher percentage of men exercise than women but it’s hard to say if it’s statistically significant.

albeatty

Hi, here’s my code y’all:

 allisons_data <- read.csv('https://marksmath.org/cgi-bin/random_data.csv? 
 username=albeatty')
head(allisons_data)
  first_name  last_name age gender height weight income smoke100 exerany
1      Retha      Reese  41 female  63.91 166.41  24811        Y       Y
2      Edwin       Hamm  56   male  70.29 199.83  42624        N       Y

  handedness
1          L
2          L

this is what the distribution of height looks like in this data:

hist(allisons_data$height)

histo

let’s look at the relationship b/w gender and exercise next:

table(allisons_data$gender,allisons_data$exerany)

      N  Y
female   9 45
male     9 37

how about a better visualization of that…

mosaicplot(table(allisons_data$gender,allisons_data$exerany))

mosaicPlot

there appears to be no overt relationship in my random data, though slightly more females than males exercise.

robin

![Rplot|454x395](upload://yoeONB23cmS7HHS6SaJ6QQmIY9Z.png)
 mydf = read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=robin")
head(mydf,2)

first_name last_name age gender height weight income
1 Kathleen Miller 36 female 65.03 185.22 21171
2 Gary Murphy 46 male 69.96 162.88 30152
smoke100 exerany handedness
1 N Y R
2 Y Y R
table(mydf$gender,mydf$exerany)
N Y
female 11 38
male 7 44
It looks like males are likely to exercise because they have a higher proportion of exercisers to non exercisers than females do.

AlexisBrandt

Code:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=AlexisBrandt')
head(mydf,2) 

  
table(mydf$gender,mydf$exerany)

       N  Y
female 16 29
male   17 38

mmealie

First I defined the data set.

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mmealie')

Histogram

hist(mydf$height)

Contingency Table

table(mydf$gender,mydf$exerany)
           N  Y
   female 14 42
   male    8 36

There doesn’t seem to be a strong relationship between gender and exercise.

mdavis9

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mdavis9')
 head(mydf,2)
  first_name last_name age gender height weight income
1     Hattie      Pike  41 female  61.30 197.38   3663
2    Dorothy     Suggs  26 female  65.64 165.44    880


hist(mydf$height)

Rplot01

table(mydf$gender,mydf$exerany)
         N  Y
  female 13 39
  male    9 39

Lumpyhead00

my attempt.

mydf=read.csv('https://marksmath.org/cgi-bin/random_data.csv? 
username=Lumpyhead00')

head(mydf,2)
  first_name last_name age gender height weight income
1        Lin     Myers  29 female  62.65 235.39   3337
2     Daniel   Slatton  31   male  63.78 130.87  20993
smoke100 exerany handedness
1        N       Y          R
2        N       Y          R
hist(mydf$height)

my%20histogram

KBiehler1

mydf=read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=KBiehler1') 
head(mydf, 2)

first_name last_name age gender height weight income smoke100 exerany handedness
1      Julie    Maybee  25 female  62.43 125.03  31627        N       Y          R
2     Sandra  Robinson  29 female  64.87 173.35  55357        Y       Y          R

hist(mydf$height)

mosaicplot(table(mydf$gender, mydf$exerany))

table(mydf$gender, mydf$exerany)

        N  Y
female 14 40
male    8 38

Based on the table generated, there are 46 male responses and 54 female responses. 82.6% of male respondents stated they exercised regularly while 74% of female respondents stated they exercised regularly.

philycheesestk

Here is what I did:

 mydf = read.csv('https://marksmath.org/cgi-   bin/random_data.csv?username=philycheesestk') head(mydf,2) 
 first_name last_name age gender height weight income smoke100 exerany handedness
 1     Julius   Vaccaro  43   male  69.12 191.55      5        N       Y          R
 2    William     White  33   male  66.92 163.11  11071        Y       Y          R

here is the code for the histogram:

hist(mydf$height)

Contingency table:

        N  Y
female 10 30
male   14 46

Based on this data, males were more likely to exercise than females!

mark

@mmealie This looks really close! Could you mention how you defined mydf in the first place?

mark

@jgilfill This looks really close! I thing you’re formatting your “code” as a block quote, though.

This is a block quote. The text looks like the rest of the text and can be formatted.

We can input a block quote with a greater than sign. Thus the previous line was input like so:

> This is a **block quote**. The text looks like the rest of the text and can be formatted.

But that line is code. It’s mono-spaced and can’t be formatted.

mark

@jgilfill I edited your post so that the first portion is formatted correctly. If you press the little edit button that looks like a pencil, you can see how I did it. Do you think you can get the second portion?

jgilfill

mydf=read.csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=jgilfill')    
head(mydf,2)

 first_name last_name    age gender height   weight    income  smoke100 exerany handedness
       John     Gable     36   male  68.58   196.09      7105       Y       Y           R
       Ellen Phillips     34 female  65.51   173.75    207614       N       Y           R

hist(mydf$height)

table(mydf$gender,mydf$exerany)

         N  Y
female   8 39
male    14 39

Based on the table generated, males and females exercise the same amount.

mosaicplot(table(mydf$gender, mydf$exerany))

KBC2019

This is my attmept:

 mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=KBC2019')

head(mydf)
first_name last_name age gender height weight income
1     Amanda   Hamlett  27 female  62.49 164.51   4147
2    Barbara  Samantha  32 female  64.99 158.84   9763

    hist(mydf$height)

Rplot01

smoke100 exerany handedness
1        Y       Y          R
2        Y       N          R
3        N       Y          R
4        Y       Y          R
5        Y       Y          R
6        N       N          R

Summary

This text will be hidden

jmahan

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=jmahan')
head(mydf,2)

hist(mydf$height)

Jmahan%20Height%20Histogram%20mydf

table(mydf$gender,mydf$exerany)
    
      N  Y
female 14 42
male   11 33

ktaylor4

First, I read in my data:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=ktaylor4')
head(mydf,2)

#Output:
 first_name last_name age gender height weight income
1      Keith  Crutcher  23   male  69.52 229.40   7959
2      Jerry     Fahie  40   male  68.50 182.57   6022

Then, I generated my histogram:

hist(mydf$height)

 table(mydf$gender,mydf$exerany)
Here's my contingency table:         
     N  Y
  female 19 35
  male    6 40

Conclusion

Females in this set exercise less than males but its hard to say that they exercise significantly less.

mark

@ktaylor4 Getting close! There are some issues, though:

You should create a code block by indenting four spaces.
- Currentely, you’re starting your code block with a greater than symbol > yielding a blockquote.
Code and output should be properly arranged so that we can see which code block created which output.
It would be nice to have some prose indicating what’s going on.

I edited your post so that the first couple of parts were properly formatted. If you hit the edit button, you can see how I did so. Do you think you can finish?