Data recap

Data-wise, we started the semester looking at dataframes stored in CSV files scraped off the web. Just before the first exam, we explored sample surveys as a tool to generate new data. Now, we’ll learn more about more carefully designed experiments and studies.

Retrospective and prospective studies

Retrospective studies

Retrospective studies are often observational. That is, we observe the recent past.

Our book’s example mentions a study that relates GPA and music study. The observation is that music students had a higher overall grade point average (3.59) than students who were not enrolled in a music class (2.91).

Can we conclude that studying music improves academic performance?

Probably not. Some confounding variables include:

  • Music students may have better parental support,
  • Music students may come from wealthier homes,
  • Maybe smarter kids get interested in playing music in the first place.

Retrospective studies are prone to these problems but can often provide valuable clues.

Designed experiements

The experimental approach to the Music / GPA question might go like so: Select 100 third graders. Randomly assign them into one of two groups - one who takes music lessons and one that doesn’t. Examine the groups over the course of several years and compare their grades.

Four key principles of experiments

  • Controls - for something to compare against
  • Randomization - to equalize effects of unknowns
  • Blocking -
  • Replication

Example

Suppose we want to explore the efficacy of a drug in preventing heart attacks. We might randomly select 432 patients on which to perform an experiment.

  • Control: We split the group of patients into two groups:
    • A treatment group that receives the experimental drug
    • A control group that doesn’t receive the drug; they might receive a placebo.
  • Randomization: The groups should be chosen randomly to prevent bias and to even out confounding factors.
  • Replication: The results should be reproducible
  • Blocking: We might break the control and treatment groups in to smaller groups or blocks.
    • Reduces variabilty in the groups
    • Allows us to identify confounding factors
    • Example: We might block by gender, age, or degree of risk.

If we find differences between the groups we can examine whether they are statistically significant or not.