An archive the questions from Mark's Summer 2018 Stat 185.

When to use N vs N-1 in equations

jmahan

I am a confused on when we should use n vs n-1 in the equations for standard deviation (standard error) ect.

Does this pertain to sample size or some other factor regarding questions?

Such as in question 9 where we are computing the SE to determine the confidence interval.

robin

n-1 is for when you are using a sample from a larger data set. For example, if you are sampling 100 people from the CDC data set.

jthomps6

As far as I know from Chemistry (which Im not sure if it applies here), but N-1 is dealing with a sample size while N is dealing with a whole population.

mark

To elaborate on @robin’s comment let’s be clear that the n-1 arises precisely when computing the standard deviation of a sample of numerical data. This is something that’s typically done on a computer and done by hand only when you’ve got a very small list of numbers. Thus, this is not an issue in problem number 9.

While we’re at it, let’s take a look at problem 9, which states:

Suppose now that we are interested in the proportion of in-state UNCA students who are from outside Western NC this year. Since data on the entire student body is not yet available for this year, we draw a simple random sample of 64 in-state UNCA students and find that 34 of them are from outside Western NC. Use this data to write down a 95% confidence interval for the proportion of in-state UNCA students who from outside Western NC.

In this problem, the confidence interval will have the form

[\hat{p} - z^* \times SE, \hat{p} + z^* \times SE] ,

where hat(p) = 34//64 ~~ 0.53125, z^* = 2 (or z^* = 1.96, if you want to be more precise), and the standard error is

SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{(34/64)(30/64)}{64}} \approx 0.0624.

Thus, the confidence interval is:

[0.53125 - 2\times 0.0624, 0.53125 + 2\times 0.0624] \approx [0.40645 0.65605].