We started playing with data in the first week of the semester. To this point, the data we’ve played with has all come from some CSV file or it’s been scraped off of the web. In part III of the text, we learn how data is gathered at a more fundamental level. In particular, we’ll explore
To understand the terms population, parameter, sample, and statistic, it helps to keep in mind that we are often working with some specific study. In that context,
Suppose we’d like to know the average height of women enrolled at UNCA. According to the UNCA Factbook, there were 2147 women enrolled in the Fall of 2017. I’m not even sure how many are enrolled this year; it might be hard to round up all of them.
So, here’s a more practical approach: We have 42 women enrolled in this class who filled out our discourse survey. The average height of those women is 5’5’’. We might hope that could be a good estimate to the average height of all women at UNCA.
This is a fundamental idea in statistics: we wish to study some parameter for a large population but doing so is a too unwieldly or down right impossible. Thus, we choose managable sample from the population and estimate the parameter with the corresponding statistic computed from the sample. In the example above,
A number of key questions arise:
The first question will consume much of the second half of our semester under the general topic of inference.
Section 10.1 of the book sums up the answer to the second question using three big ides:
This approach is in contrast to the idea of just grabbing the whole population - often called a census.
Here are a few strategies to implement the big ideas
The idea is so simple - choose \(n\) people from the population independently and with equal probability. It’s a bit hard to achieve in practice, though.
Example: Randomly generated phone numbrers