5.1: Samples and Polulations
- Page ID
- 198645
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Like most people, you probably feel that it is important to "take control of your life." But what does this mean? Partly, it means being able to properly evaluate the data and claims that bombard you every day. If you cannot distinguish good from faulty reasoning, then you are vulnerable to manipulation and to decisions that are not in your best interest. Statistics provides tools that you need in order to react intelligently to information you hear or read. In this sense, statistics is one of the most important things that you can study.
Here are some claims that you may have heard on several occasions. (We are not saying that each one of these claims is true!)
- \(4\) out of \(5\) dentists recommend Dentyne.
- Almost \(85\%\) of lung cancers in men and \(45\%\) in women are tobacco-related.
- Condoms are effective \(94\%\) of the time.
- Native Americans are significantly more likely to be hit crossing the streets than are people of other ethnicities.
- People tend to be more persuasive when they look others directly in the eye and speak loudly and quickly.
- Women make \(75\) cents to every dollar a man makes when they work the same job.
- A surprising new study suggests that consuming egg whites may increase one's lifespan.
- People predict that it is very unlikely there will ever be another baseball player with a batting average over 400.
- There is a \(70\%\) chance that in a room full of \(30\) people, at least two people will share the same birthday.
- \(79.48\%\) of all statistics are made up on the spot.
Before going further, however, two distinct meanings of the word “statistics” should be clarified. First, “statistics” could mean the academic discipline, a field of study, involving the collection, presentation, interpretation, and use of data. When used in this sense, the word is considered singular (though it looks plural). Second, in a narrower sense, a number collected in a survey or study is referred to as a “statistic.” For instance, the average height of a sample of 100 people would be a (one) statistic. This is a singular word (of course, two or more of these numbers would be referred to as “statistics”). Be aware of the difference when you encounter this word.
All of these claims are statistical in character. We suspect that some of them sound familiar; if not, we bet that you have heard other claims like them. Notice how diverse the examples are; they come from psychology, health, law, sports, business, etc. Indeed, data and data-interpretation show up in discourse from virtually every facet of contemporary life.
Statistics are often presented in an effort to add credibility to an argument or advice. You can see this by paying attention to television advertisements. Many of the numbers thrown about in this way do not represent careful statistical analysis. They can be misleading and push you into decisions that you might find cause to regret. For these reasons, learning about statistics is a long step towards taking control of your life. (It is not, of course, the only step needed for this purpose.) This chapter will help you learn statistical essentials. It will make you into an intelligent consumer of statistical claims.
You can take the first step right away. To be an intelligent consumer of statistics, your first reflex must be to question the statistics that you encounter. The British Prime Minister Benjamin Disraeli famously said, "There are three kinds of lies -- lies, damned lies, and statistics." This quote reminds us why it is so important to understand statistics. So let us invite you to reform your statistical habits from now on. No longer will you blindly accept numbers or findings. Instead, you will begin to think about the numbers, their sources, and most importantly, the procedures used to generate them.
We have put the emphasis on defending ourselves against fraudulent claims wrapped up as statistics. Just as important as detecting the deceptive use of statistics is the appreciation of the proper use of statistics. You must also learn to recognize statistical evidence that supports a stated conclusion. When a research team is testing a new treatment for a disease, statistics allows them to conclude based on a relatively small trial that there is good evidence their drug is effective. Statistics allowed prosecutors in the 1950’s and 60’s to demonstrate racial bias existed in jury panels. Statistics are all around you, sometimes used well, sometimes not. We must learn how to distinguish the two cases.
Before we begin gathering and analyzing data, we need to characterize the population we are studying. If we want to study the amount of money spent on textbooks by a typical first-year college student, our population might be all first-year students at your college. Or it might be the following:
- All first-year community college students in the state of Washington.
- All first-year students at public colleges and universities in the state of Washington.
- All first-year students at all colleges and universities in the state of Washington.
- All first-year students at all colleges and universities in the entire United States.
- And so on.
In statistics, an individual is a single object or member of a population or sample that is being studied.
If a study records the heights of 30 students, each student is an individual, and height is a variable measured on those individuals.
In statistics, a population is the entire group of individuals or items that we want to study or draw conclusions about.
The population includes all individuals of interest, not just some of them.
Sometimes, the intended population is referred to as the target population. If we design our study poorly, the collected data may not accurately represent the intended population.
Why is it important to specify the population? We might get different answers to our question as we vary the population we are studying. First-year students at the University of Washington might take slightly more diverse courses than those at your college, and some of these courses may require less popular textbooks that cost more; or, on the other hand, the University Bookstore might have a larger pool of used textbooks, reducing the cost of these books to the students. Whichever the case (and it is likely that some combination of these and other factors is in play), the data we gather from your college will probably not be the same as that from the University of Washington. Particularly when conveying our results to others, we want to be clear about the population we are describing with our data.
A newspaper website contains a poll asking people their opinion on a recent news article. What is the population?
- Answer
-
While the target (intended) population may have been all people, the real population of the survey is readers of the website
If we were able to gather data on every member of our population, say the average (we will define "average" more carefully in a subsequent section) amount of money spent on textbooks by each first-year student at your college during the 2009-2010 academic year, the resulting number would be called a parameter.
A parameter is a value (average, percentage, etc.) calculated using all the data from a population.
We seldom see parameters; however, since surveying an entire population is usually very time-consuming and expensive, unless the population is very small or we already have the data collected.
In a census, we gather information from every member of a population, and a census is a survey of an entire population. In a census, we gather information from every member of a population. In a survey, we gather information from a sample representing the population.
You are probably familiar with two common censuses: the official government Census that attempts to count the population of the U.S. every ten years, and voting, which asks the opinion of all eligible voters in a district. The first of these demonstrates one additional problem with a census: the difficulty in finding and getting participation from everyone in a large population, which can bias or skew the results.
There are occasional instances when a census is appropriate, typically when the population is relatively small. For example, if the manager of Starbucks wanted to know the average number of hours her employees worked last week, she should be able to pull up payroll records or ask each employee directly.
Since surveying an entire population is often impractical, we usually select a sample to study.
A sample is a smaller subset of the entire population, ideally one that is fairly representative of the whole population.
We discuss sampling methods in greater detail in a later section. For now, let us assume that samples are chosen in an appropriate manner. If we survey a sample, say 100 first-year students at your college, and find the average amount of money spent by these students on textbooks, the resulting number is called a statistic.
A statistic is a value (average, percentage, etc.) calculated using the data from a sample.
Note: Compare the statistic with the parameter.
A researcher wanted to know how citizens of Tacoma felt about a voter initiative. To study this, she goes to the Tacoma Mall and randomly selects \(500\) shoppers and asks them their opinion. \(60\%\) indicates they are supportive of the initiative. What is the sample and population? Is the \(60\%\)value a parameter or a statistic?
- Answer
-
The sample is the \(500\) shoppers questioned. The population is less clear. While the intended population of this survey was Tacoma citizens, the effective population was mall shoppers. There is no reason to assume that the \(500\) shoppers questioned would be representative of all Tacoma citizens.
The \(60\%\) value was based on the sample, so it is a statistic.
Types of Sampling Methods
As stated before, if you want to know something about a population, it is often impossible or impractical to examine the whole population. It might be too expensive in terms of time or money. It might be impractical – you can’t test all batteries for their length of lifetime because there wouldn’t be any batteries left to sell. You need to look at a sample. Hopefully, the sample behaves the same as the population.
When you choose a sample, you want it to be as similar to the population as possible. If you want to test a new painkiller for adults, you would want the sample to include people who are fat, skinny, old, young, healthy, not healthy, male, female, etc.
There are many ways to collect a sample. None are perfect, and you are not guaranteed to collect a representative sample. That is, unfortunately, the limitations of sampling. However, several techniques can result in samples that give you a semi-accurate picture of the population. Just remember to be aware that the sample may not be representative. As an example, you can take a random sample of a group of people that are equally males and females, yet by chance, everyone you choose is female. If this happens, it may be a good idea to collect a new sample if you have the time and money.
There are many sampling techniques, though only five will be presented here.
- Simple Random Sample
- Stratified Sample
- Systematic Sample
- Cluster Sample
- Convenience Sample
Simple Random Sample
A simple random sample (SRS) of size \(n\) is a sample that is selected from a population in a way that ensures that every different possible sample of size \(n\) has the same chance of being selected. Also, every individual associated with the population has the same chance of being selected
- Put all names in a hat and draw a certain number of names out.
- Assign each individual a number and use a random number table or a calculator, or a computer to randomly select the individuals that will be measured.

Figure \(\PageIndex{1}\): "Simple Random Sample Diagram" by Toros Berberyan(opens in new window) is licensed under CC BY-SA 4.0(opens in new window)
In the simple random sampling, there is an original population of \(12\) balls. We conduct a random sample by randomly selecting three balls from the population to be part of the sample. In this case, we have picked the ones that have the numbers \(2\), \(9\), and \(12\).
Stratified Sampling
Stratified sampling involves dividing the population into groups, called strata, and then selecting a simple random sample from each stratum.
- If you want to look at musical preference, you could divide the individuals into age groups and then conduct simple random samples within each group.
- To calculate the average price of textbooks, you could divide the individuals into groups by major and then conduct simple random samples within each group.

The illustration above is an example of a stratified sample. There is an original population of \(12\) balls. They are grouped based on a common characteristic, in this case, color. We have \(4\) dark grey balls in the first group, \(5\) blue balls in the second group, and \(3\) purple balls in the third group. From all groups, some members are randomly selected to be part of the sample. From the first group, the balls that are numbered \(1\) and \(12\) are selected to be part of the sample. From the second group, the balls that are numbered \(4\) and \(11\) are selected to be part of the sample. From the third group, the balls that are numbered \(2\) and \(10\) are selected to be part of the sample. Based on stratified sampling, we ended up with a sample of 6 balls numbered \(1\) , \(2\), \(4\) , \(10\) , \(11\) , and \(12\).
Systematic sampling is where you randomly choose a starting place and then select every \(k\)th individual to measure.
- You select every \(5\) th item on an assembly line
- You select every \(10\) th name on the list
- You select every \(3\) rd customer that comes into the store.

The diagram above represents a systematic sample. We have a population of \(12\) balls inside a curvy road. The first one is picked randomly. In this example, it will be the second one from above. Then, the rest of them are selected below the first one in a certain pattern. The diagram shows that every third one is selected to be part of the sample. Based on systematic sampling, the selected individuals in the sample are those numbered \(2, 5, 8\), and \(11\) .
Cluster sampling involves dividing the population into groups, known as clusters. Randomly pick some clusters, then poll all individuals in those clusters.
- A large city wants to poll all businesses in the city. They divide the city into sections (clusters), maybe a square block for each section, and use a random number generator to pick some of the clusters. Then, they poll all businesses in each chosen cluster.
- You want to measure whether a tree in the forest is infected with bark beetles. Instead of having to walk all over the forest, you divide the forest into sectors and then randomly pick the sectors that you will travel to. Then record whether a tree is infected or not for every tree in that sector.

The illustration above is an example of a cluster sample. There is an original population of \(12\) balls. They are grouped into clusters (blocks). Each group has two balls. Then, some groups are selected, and all members from those groups are chosen to be part of the sample. In this example, the group that only had balls numbered \(3\) and \(4\) is selected to be part of the sample. Then, a second group with only balls numbered \(9\) and \(10\) is selected to be part of the sample. Finally, a third group with only balls numbered \(11\) and \(12\) is selected to also be part of the sample. Based on cluster sampling, we ended up with three groups and all its members being part of the sample. The sample included balls with the numbers \(3, 4, 9, 10, 11,\) and \(12\).
Many people confuse stratified sampling and cluster sampling. In stratified sampling, you use all the groups and some members from each group. Cluster sampling is the opposite approach. Cluster sampling involves selecting a subset of groups and all the members within each group.
The four sampling techniques that were presented all have advantages and disadvantages. There is another sampling technique that is sometimes utilized because either the researcher is unfamiliar with it or it is easier to implement. This sampling technique is known as a convenience sample.
A convenience sample is one where the researcher selects individuals to be included who are readily accessible to the researcher for collection.
If you want to know the opinion of people about the criminal justice system, you stand on a street corner near the county courthouse, and question the first \(10\) people who walk by. The people who walk by the county courthouse are most likely involved in some way with the criminal justice system, and their opinions would not necessarily represent the views of all individuals.
A convenience sample will not yield a representative sample and should be avoided.
Banner Health is a multi-state nonprofit chain of hospitals. Management wants to assess the incidence of complications after surgery. They wish to use a sample of surgery patients. Several sampling techniques are described below. Categorize each technique as a simple random sample, stratified sample, systematic sample, cluster sample, or convenience sampling.
- Obtain a list of patients who had surgery at all Banner Health facilities. Divide the patients according to the type of surgery. Draw simple random samples from each group.
- Obtain a list of patients who had surgery at all Banner Health facilities. Number these patients, and then use a random number table to obtain the sample.
- Randomly select some Banner Health facilities from each of the seven states, and then include all the patients on the surgery lists of the states.
- At the beginning of the year, instruct each Banner Health facility to record any complications from every \(100\)th surgery.
- Instruct each Banner Health facility to record any complications from \(20\) surgeries this week and send in the results.
- Answer
-
- This is a stratified sample since the patients were separated into different strata, and then random samples were taken from each stratum. The problem with this is that some types of surgeries may have a higher chance of complications than others. Of course, the stratified sample would show you this.
- This is a random sample since each patient has the same chance of being chosen. The problem with this one is that it will take a while to collect the data.
- This is a cluster sample since all patients are questioned in each of the selected hospitals. The problem with this is that you could have, by chance, selected hospitals that have no complications.
- This is a systematic sample since they selected every \(100\)th surgery. The problem with this is that if every \(90\)th surgery has complications, you wouldn’t see this come up in the data.
- This is a convenience sample since they left it up to the facility on how to do it. The problem with convenience samples is that the person collecting the data will probably collect data from surgeries that had no complications.
Guidelines for Planning a Statistical Study
- Identify the individuals that you are interested in studying. Realize that you can only make conclusions for these individuals. As an example, if you use a fertilizer on a certain genus of plant, you cannot say how the fertilizer will work on any other types of plants. However, if you diversify too much, then you may not be able to tell if there really is an improvement since you have too many factors to consider.
- Specify the variable. You want to make sure the variable is something that you can measure, and make sure that you control for all other factors too. For example, if you are trying to determine if a fertilizer works by measuring the height of the plants on a particular day, you need to make sure you can control how much fertilizer you put on the plants (which is what we call a treatment), and make sure that all the plants receive the same amount of sunlight, water, and temperature.
- Specify the population. This is important in order for you to know for whom and what conclusions you can make.
- Specify the method for taking measurements or making observations.
- Determine if you are taking a census or sample. If taking a sample, decide on the sampling method.
- Collect the data.
- Use appropriate descriptive statistics methods and make decisions using appropriate inferential statistics methods.
- Note any concerns you might have about your data collection methods and list any recommendations for future.
Attribution:
https://stats.libretexts.org/Bookshe...ion_Principles
"1.2: Sampling Methods" by Kathryn Kozak is licensed under CC BY-SA 4.0


