Let’s say you want to measure the impact of COVID-19 on the views of American voters about the 2020 election. You could start by asking every registered voter, but this would be an incredibly daunting task, requiring a lot of time, effort, and resources. So instead, you should take a sample of the population and survey just that sample.
We sample for three main reasons: efficiency, representation, and extrapolation.
- Efficiency – Surveying just a fraction of the entire registered voter population in the U.S. can provide a decent sense of how the entire population would vote. Because you don’t need to track down every person’s opinion, you will save an exorbitant amount of time and money.
- Representation – When sampling a population, it is important to pick a group within your chosen population that portrays the different responses within the total population as accurately and proportionately as possible. Consider the example of Americans’ perceptions of the federal response to COVID-19. If you were to only survey in San Francisco, CA, or Tulsa, OK, this might be a biased sample on account of confounding factors, like regional political leanings.
- Extrapolation – With a properly selected sample and well-designed survey, you can extrapolate your results and apply them with some degree of certainty to the broader population.
Watch this video from 365 Data Science for more about differences between a population and a sample.