An archive the questions from Mark's Summer 2018 Stat 185.

Unpaired book prices

mark

I’ve got two CSV files on my webspace that you can read into R as follows:

df1 = read.csv("http://www.marksmath.org/data/book_prices_arts.csv")
df2 = read.csv("http://www.marksmath.org/data/book_prices_sciences.csv")
  1. Use the head or tail command to examine the first few rows of these data files. Can you conjecture what these files contain?
  2. Use the t.test command to examine whether the Amazon price of one of these classes of books is, on average, different from the other.
  3. Use the mean and sd commands to find the means and standard deviations of the data so that you can do the comparison by hand.
jthomps6

Here is what I found from the CSV Files that you have provided starting with the code that I used.

df1 = read.csv("http://www.marksmath.org/data/book_prices_arts.csv")
df2 = read.csv("http://www.marksmath.org/data/book_prices_sciences.csv")

head(df1)
head(df2)

Both heads for df1 and df2 shows that each of the data sets have information on types of books and the new prices being compared with the bookstore to Amazon but, df1 shows books in the arts while df2 shows books in the sciences.

t.test(x = df1$bookstr_new_price, y = df1$amazon_new_price, alterative = "two.sided")

                Welch Two Sample t-test

data:  df1$bookstr_new_price and df1$amazon_new_price
t = 1.5207, df = 86.852, p-value = 0.132
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.274907 24.603369
sample estimates:
mean of x mean of y 
40.45269  29.78846 
##############################################################################
t.test(x = df2$bookstr_new_price, y = df2$amazon_new_price, alterative = "two.sided")

               Welch Two Sample t-test

data:  df2$bookstr_new_price and df2$amazon_new_price
t = 2.5254, df = 83.33, p-value = 0.01345
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
9.641998 81.123002
sample estimates:
mean of x mean of y 
170.8825  125.5000 
###########################################################################
mean(df1$bookstr_new_price)
40.45269
mean(df1$amazon_new_price)
29.78846
mean(df2$bookstr_new_price)
170.8825
mean(df2$amazon_new_price)
125.5

sd(df1$bookstr_new_price)
42.57564
sd(df1$amazon_new_price)
27.28882
sd(df2$bookstr_new_price)
91.52316
sd(df2$amazon_new_price)
76.37286

HAND DATA

Arts Books

Bookstore
xbar = 40.45
sigma = 42.57
n = 52

Amazon
xbar = 29.78
sigma = 27.28
n = 52

T-Statistic = SE = (40.45-29.78)/(sqrt((42.57^2/52+27.28^2/52))) = 1.52

Science Books

Bookstore
xbar = 170.88
sigma = 91.52
n = 44

Amazon
xbar = 125.5
sigma = 76.37
n = 44

T-Statistic = SE = (170.88-125.5)/(sqrt((91.52^2/44+76.37^2/44))) = 2.52

philycheesestk

After importing the data into R and using the code provided in the prompt and using the head command, I was able to make some insights about this data set. Both data sets contain different textbooks that you can order either through Amazon or the UNCA Bookstore. The tables show the price that each of the books are listed at, as well as the title of each text. Data set “df1” contains textbooks used in liberal arts and “df2” looks at textbooks used in the sciences.

Here are my t-test calculations from R:

t.test(x = df1$bookstr_new_price, y = df1$amazon_new_price, alterative = "two.sided")

       Welch Two Sample t-test

data:  df1$bookstr_new_price and df1$amazon_new_price
t = 1.5207, df = 86.852, p-value = 0.132
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -3.274907 24.603369
sample estimates:
mean of x mean of y 
     40.45269  29.78846 

t.test(x = df2$bookstr_new_price, y = df2$amazon_new_price, alterative = "two.sided")

             Welch Two Sample t-test

data:  df2$bookstr_new_price and df2$amazon_new_price
t = 2.5254, df = 83.33, p-value = 0.01345
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
   9.641998 81.123002
sample estimates:
mean of x mean of y 
    170.8825  125.5000

mean(df1$bookstr_new_price)
40.45269
mean(df1$amazon_new_price)
29.78846
mean(df2$bookstr_new_price)
170.8825
mean(df2$amazon_new_price)
125.5

sd(df1$bookstr_new_price)
42.57564
sd(df1$amazon_new_price)
27.28882
sd(df2$bookstr_new_price)
91.52316
sd(df2$amazon_new_price)
76.37286

Hand Data

Liberal Arts

Bookstore
x = 40.45
σ = 42.57
n = 52

Amazon
x = 29.78
σ = 27.28
n = 52

t-stat = (40.45 − 29.78) / √ ((42.57^2)/52) + ((27.28^2)/52) = 1.52

Science Books

Bookstore
x = 170.88
σ = 91.52
n = 44

Amazon
x = 125.5
σ = 76.37
n =44

t-stat = (170.88−125.5) / √ ((91.52^2)/44) + ((76.37^2)/44) = 2.52

jgilfill
 df1 = read.csv("http://www.marksmath.org/data/book_prices_arts.csv")
 head(df1)

 df2 = read.csv("http://www.marksmath.org/data/book_prices_sciences.csv")
 head(df2)

t.test(x = df1$bookstr_new_price, y = df1$amazon_new_price, alterative = “two.sided”)

Welch Two Sample t-test

data: df1$bookstr_new_price and df1$amazon_new_price
t = 1.5207, df = 86.852, p-value = 0.132
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.274907 24.603369
sample estimates:
mean of x mean of y
40.45269 29.78846

t.test(x = df2$bookstr_new_price, y = df2$amazon_new_price, alterative = “two.sided”)

Welch Two Sample t-test

data: df2$bookstr_new_price and df2$amazon_new_price
t = 2.5254, df = 83.33, p-value = 0.01345
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
9.641998 81.123002
sample estimates:
mean of x mean of y
170.8825 125.5000


Command R
arts = df1$bookstr_new_price
c(mean(arts), sd(arts))
[1] 40.45269 42.57564

c(sample(arts))
[1] 39.95 159.25 66.75 69.50 24.95 38.00 44.99 128.25 26.00 89.75 5.95 66.75 14.50 16.00
[15] 9.99 66.95 15.95 43.25 14.00 16.99 12.00 9.95 19.99 22.00 35.99 39.95 18.00 15.00
[29] 32.95 19.00 46.75 18.95 43.95 16.95 15.95 18.00 46.75 15.95 26.00 38.00 13.00 69.50
[43] 16.50 21.00 239.00 15.99 19.50 4.00 88.50 12.00 108.75 26.00

52 books for arts at book store, so n=52

arts = df1$amazon_new_price
c(mean(arts), sd(arts))
[1] 29.78846 27.28882

c(sample(arts))
[1] 34 29 38 10 26 28 12 40 80 11 60 14 17 8 74 28 8 56 15 32 9 10 14 20
[25]12 72 10 12 13 40 1 10 16 18 24 34 73 13 93 14 23 11 108 59 15 12 3 6
[49] 18 15 114 37

52 books for arts at amazon, so n=52

Art Books at Bookstore and Amazon

Bookstore
X = 40.45269
σ = 42.57564
n = 52

Amazon
X = 29.78846
σ = 27.28882
n = 52

t-stat
40.45-29.79 / √((42.58^2/52)+(27.29^2/52))

Science Books at Bookstore and Amazon

Bookstore
X = 170.88250
σ = 91.52316
n = 44

Amazon
X = 125.50000
σ = 76.37286
n = 44

t-stat
170.88–125.50 / √((91.52^2/44)+(76.37^2/44))

Command R
science = df2$bookstr_new_price
c(mean(science), sd(science))
[1] 170.88250 91.52316

c(sample(science))
[1] 254.75 315.75 35.00 143.25 183.50 232.00 283.50 283.50 76.00 39.99 283.50 84.99 199.95 129.25
[15] 227.25 315.75 45.00 125.75 143.25 118.50 79.95 214.00 143.25 125.75 143.25 114.00 77.50 102.00
[29] 254.75 80.00 91.25 262.75 350.75 282.50 227.25 305.00 230.75 26.00 202.50 64.95 129.25 65.25
[43] 282.50 143.25

44 books for science at bookstore, so n=44

science = df2$amazon_new_price
c(mean(science), sd(science))
[1] 125.50000 76.37286

c(sample(science))
[1] 124 134 18 65 38 50 228 48 95 159 83 85 59 161 280 94 93 226 217 63 288 97 256 217
[25] 145 63 163 68 229 71 47 34 178 198 40 144 43 98 241 172 27 200 96 87

44 books for science at amazon, so n=44