(10 points - due by Monday, July 25 at 8:00 PM)
There a number of sites on the web that will generate random YouTube videos for you - for example Random-ize and YT Roulette. Google will find more. Using one of these sites, generate a table of 5 random YouTube videos. Using Random-ize, I generated the following five videos:
I entered the table like so:
| Kid Appropriate | Comment | Link |
| :---: | :---: | :---: |
| N | Language | [link](https://www.youtube.com/watch?v=EovG1QzAL2k) |
| Y | It's *for* kids! | [link](https://www.youtube.com/watch?v=eadOJLjvvC0) |
| Y | | [link](https://www.youtube.com/watch?v=EBM854BTGL0) |
| Y | | [link](https://www.youtube.com/watch?v=WmujtTeoUdA)
| N | kinda gross | [link](https://www.youtube.com/watch?v=U1O2bjD08ZI)
Results
The results are in and I’ve got the data stored in a CSV file:
df = read.csv('https://www.marksmath.org/data/AppropriatenessOfYouTubeVideosForChildren.csv')
df
# Out:
Y N B
1 3 2 0
2 3 1 1
........
15 3 2 0
Each row tallies the results of one post. For example, the second row records the results of Audrey’s post who found 3 Ys, 1 N, and 1 B. We can combine these into one overall tally by summing the rows:
colSums(df)
Y N B
45 21 9
Now, suppose we are interested in the following: what proportion of YouTube videos are inappropriate for kids - either clearly inappropriate (with an N) or borderline (with a B). We could ask R for a confidence interval as follows:
d = c(rep(0,45), rep(1,30))
t.test(d)
# Out:
One Sample t-test
data: d
t = 7.0238, df = 74, p-value = 8.876e-10
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.2865257 0.5134743
sample estimates:
mean of x
0.4
Looks like our 95% confidence interval is from just over a quarter to just over a half.
Additional exercises:
- List any problems you think there might be with our sampling technique.
- Use the basic formulae for confidence intervals to compute the confidence interval directly.