An archive the questions from Mark's Summer 2018 Stat 185.

Random CDC like data

mark

(10 pts)

I’ve got a random data generator on my webserver. You can download data directly into R and view the first couple of rows like so:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mark')
head(mydf,2)

#    first_name last_name age gender height weight income smoke100 exerany
#  1      Donna     Dinan  35 female  65.37 164.26   1947        0       1
#  2      Ramon     Davis  26   male  71.81 193.70  39311        1       1

Note that the username field must match your forum username and everyone gets different data.

The problem: Do the following using your data:

  • Generate a histogram of the heights
  • generate a contingency table relating gender and exercise. Does there appear to be a relationship? If so, what is that relationship?

Be sure to include the code you typed to get your answer. Code blocks are created by indenting your input four spaces.

Henry
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=Henry')
head(mydf,2)
  first_name last_name age gender height weight income smoke100 exerany handedness
1      Sunny    Gaiser  31 female  64.01 182.12   1825        N       N          R
2      Roger      Dunn  53   male  67.85 153.32  53977        Y       Y          R
hist(mydf$height)
table(mydf$gender,mydf$exerany)
          N  Y
  female 22 41
  male   10 27

Based on the contingency table, males are slightly more likely to exercise than females. 73% of male respondents reported that they exercised while only 65% of females reported that they exercised.

jthomps6

Here are my results and code from the following questions stated above.

**Code:**
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=jthomps6')
head(mydf,2)

hist(mydf$height)
table(mydf$gender, mydf$exercise)

**Results/Answers:**
first_name last_name age gender height weight income smoke100 exerany handedness
1     Ronald    Fields  40   male  66.66 164.40  18571        N       N          R
2     Audrey     Bower  35 female  62.32 133.95   6828        N       N          R

Contingency Table of Gender And Exercise

       N  Y
female 12 45
male   11 32

Females seemed to exercise more than males.

audrey

OK, here’s my attempt. First, I’ll read in the data:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=audrey')
head(mydf, 2)

#Output:
 first_name last_name age gender height weight income smoke100 exerany handedness
 1     George  Howerton  51   male  70.98 222.96 216670        Y       Y          R
 2        Rae    Cherry  53 female  64.33 127.77   9488        Y       Y          R

The histogram

Here’s how I generated the histogram for the heights. I used a couple of options to make the picture a little prettier.

hist(mydf$height, xlab='Heights', main='', col='gray')

heigts_hist

The contingency table

Here’s the contingency table:

table(mydf$gender, mydf$exerany)/100
    
        N    Y
female 0.16 0.29
male   0.14 0.41

Looks pretty close percentage-wise; perhaps a mosaic plot would help?

mosaicplot(table(mydf$gender, mydf$exerany), main='')

mosaic

From this sample, it looks like a slightly higher percentage of men exercise than women but it’s hard to say if it’s statistically significant.

albeatty

Hi, here’s my code y’all:

 allisons_data <- read.csv('https://marksmath.org/cgi-bin/random_data.csv? 
 username=albeatty')
head(allisons_data)
  first_name  last_name age gender height weight income smoke100 exerany
1      Retha      Reese  41 female  63.91 166.41  24811        Y       Y
2      Edwin       Hamm  56   male  70.29 199.83  42624        N       Y

  handedness
1          L
2          L

this is what the distribution of height looks like in this data:

hist(allisons_data$height)

histo

let’s look at the relationship b/w gender and exercise next:

table(allisons_data$gender,allisons_data$exerany)

      N  Y
female   9 45
male     9 37

how about a better visualization of that…

mosaicplot(table(allisons_data$gender,allisons_data$exerany))

mosaicPlot

there appears to be no overt relationship in my random data, though slightly more females than males exercise.

robin
![Rplot|454x395](upload://yoeONB23cmS7HHS6SaJ6QQmIY9Z.png)
 mydf = read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=robin")
head(mydf,2)

first_name last_name age gender height weight income
1 Kathleen Miller 36 female 65.03 185.22 21171
2 Gary Murphy 46 male 69.96 162.88 30152
smoke100 exerany handedness
1 N Y R
2 Y Y R
table(mydf$gender,mydf$exerany)
N Y
female 11 38
male 7 44
It looks like males are likely to exercise because they have a higher proportion of exercisers to non exercisers than females do.

AlexisBrandt

image

Code:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=AlexisBrandt')
head(mydf,2) 

  
table(mydf$gender,mydf$exerany)

       N  Y
female 16 29
male   17 38
mmealie

First I defined the data set.

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mmealie')

Histogram

hist(mydf$height)

Contingency Table

table(mydf$gender,mydf$exerany)
           N  Y
   female 14 42
   male    8 36

There doesn’t seem to be a strong relationship between gender and exercise.

mdavis9
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mdavis9')
 head(mydf,2)
  first_name last_name age gender height weight income
1     Hattie      Pike  41 female  61.30 197.38   3663
2    Dorothy     Suggs  26 female  65.64 165.44    880


hist(mydf$height)

Rplot01

table(mydf$gender,mydf$exerany)
         N  Y
  female 13 39
  male    9 39
Lumpyhead00

my attempt.

mydf=read.csv('https://marksmath.org/cgi-bin/random_data.csv? 
username=Lumpyhead00')

head(mydf,2)
  first_name last_name age gender height weight income
1        Lin     Myers  29 female  62.65 235.39   3337
2     Daniel   Slatton  31   male  63.78 130.87  20993
smoke100 exerany handedness
1        N       Y          R
2        N       Y          R
hist(mydf$height)

my%20histogram

KBiehler1
mydf=read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=KBiehler1') 
head(mydf, 2)

first_name last_name age gender height weight income smoke100 exerany handedness
1      Julie    Maybee  25 female  62.43 125.03  31627        N       Y          R
2     Sandra  Robinson  29 female  64.87 173.35  55357        Y       Y          R

hist(mydf$height)

mosaicplot(table(mydf$gender, mydf$exerany))

table(mydf$gender, mydf$exerany)

        N  Y
female 14 40
male    8 38

Based on the table generated, there are 46 male responses and 54 female responses. 82.6% of male respondents stated they exercised regularly while 74% of female respondents stated they exercised regularly.

philycheesestk

Here is what I did:

 mydf = read.csv('https://marksmath.org/cgi-   bin/random_data.csv?username=philycheesestk') head(mydf,2) 
 first_name last_name age gender height weight income smoke100 exerany handedness
 1     Julius   Vaccaro  43   male  69.12 191.55      5        N       Y          R
 2    William     White  33   male  66.92 163.11  11071        Y       Y          R

here is the code for the histogram:

hist(mydf$height)

Contingency table:

        N  Y
female 10 30
male   14 46

Based on this data, males were more likely to exercise than females!

mark

@mmealie This looks really close! Could you mention how you defined mydf in the first place?

mark

@jgilfill This looks really close! I thing you’re formatting your “code” as a block quote, though.

This is a block quote. The text looks like the rest of the text and can be formatted.

We can input a block quote with a greater than sign. Thus the previous line was input like so:

> This is a **block quote**. The text looks like the rest of the text and can be formatted.

But that line is code. It’s mono-spaced and can’t be formatted.

mark

@jgilfill I edited your post so that the first portion is formatted correctly. If you press the little edit button that looks like a pencil, you can see how I did it. Do you think you can get the second portion?

jgilfill
mydf=read.csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=jgilfill')    
head(mydf,2)

 first_name last_name    age gender height   weight    income  smoke100 exerany handedness
       John     Gable     36   male  68.58   196.09      7105       Y       Y           R
       Ellen Phillips     34 female  65.51   173.75    207614       N       Y           R

hist(mydf$height)

image

table(mydf$gender,mydf$exerany)

         N  Y
female   8 39
male    14 39

Based on the table generated, males and females exercise the same amount.

mosaicplot(table(mydf$gender, mydf$exerany))

image

KBC2019

This is my attmept:

 mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=KBC2019')

head(mydf)
first_name last_name age gender height weight income
1     Amanda   Hamlett  27 female  62.49 164.51   4147
2    Barbara  Samantha  32 female  64.99 158.84   9763

    hist(mydf$height)

Rplot01

smoke100 exerany handedness
1        Y       Y          R
2        Y       N          R
3        N       Y          R
4        Y       Y          R
5        Y       Y          R
6        N       N          R
Summary

This text will be hidden

jmahan
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=jmahan')
head(mydf,2)

hist(mydf$height)

Jmahan%20Height%20Histogram%20mydf

table(mydf$gender,mydf$exerany)
    
      N  Y
female 14 42
male   11 33
ktaylor4

First, I read in my data:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=ktaylor4')
head(mydf,2)

#Output:
 first_name last_name age gender height weight income
1      Keith  Crutcher  23   male  69.52 229.40   7959
2      Jerry     Fahie  40   male  68.50 182.57   6022

Then, I generated my histogram:

hist(mydf$height)

image

 table(mydf$gender,mydf$exerany)
Here's my contingency table:         
     N  Y
  female 19 35
  male    6 40

Conclusion

Females in this set exercise less than males but its hard to say that they exercise significantly less.

mark

@ktaylor4 Getting close! There are some issues, though:

  1. You should create a code block by indenting four spaces.
    • Currentely, you’re starting your code block with a greater than symbol > yielding a blockquote.
  2. Code and output should be properly arranged so that we can see which code block created which output.
  3. It would be nice to have some prose indicating what’s going on.

I edited your post so that the first couple of parts were properly formatted. If you hit the edit button, you can see how I did so. Do you think you can finish?