8.1: Gathering and Organizing Data

Last updated
Save as PDF

Page ID: 129620

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\dsum}{\displaystyle\sum\limits} $

$ \newcommand{\dint}{\displaystyle\int\limits} $

$ \newcommand{\dlim}{\displaystyle\lim\limits} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$\newcommand{\longvect}{\overrightarrow}$

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

A hand is shown holding a pen as it ticks off a box on a survey sheet. — Figure $\PageIndex{1}$: Surveys are commonly used to gather data. (credit: “survey” by Donnell King/Flickr, CC0 1.0 Public Domain)

Learning Objectives

Distinguish among sampling techniques.
Organize data using an appropriate method.
Create frequency distributions.

When a polling organization wants to try to establish which candidate will win an upcoming election, the first steps are to write questions for the survey and to choose which people will be asked to respond to the survey. These can seem like simple steps, but they have far-reaching implications in the analysis the pollsters will later carry out. The process by which samples (or groups of units from which we collect data) are chosen can strongly affect the data that are collected. Units are anything that can be measured or surveyed (such as people, animals, objectives, or experiments) and data are observations made on units.

One of the most famous failures of good sampling occurred in the first half of the twentieth century. The Literary Digest was among the most respected magazines of the early twentieth century. Despite the name, the Digest was a weekly newsmagazine. Starting in 1916, the Digest conducted a poll to try to predict the winner of each US Presidential election. For the most part, their results were good; they correctly predicted the outcome of all five elections between 1916 and 1932. In 1936, the incumbent President Franklin Delano Roosevelt faced Kansas governor Alf Landon, and once again the Digest ran their famous poll, with results published the week before the election. Their conclusion? Landon would win in a landslide, 57% to 43%. Once the actual votes had been counted, though, Roosevelt ended up with 61% of the popular vote, 18% more than the poll predicted. What went wrong?

The short answer is that the people who were chosen to receive the survey (over ten million of them!) were not a good representation of the population of voting adults. The sample was chosen using the Digest's own base of subscribers as well as publicly available lists of people that were likely adults (and therefore eligible to vote), mostly phone books and vehicle registration records. The pollsters then mailed every single person on these lists a survey. Around a quarter of those surveys were returned; this constituted the sample that was used to make the Digest’s disastrously incorrect prediction. However, the Digest made an error in failing to consider that the election was happening during the Great Depression, and only the wealthy had disposable income to spend on telephone lines, automobiles, and magazine subscriptions. Thus, only the wealthy were sent the Digest’s survey. Since Roosevelt was extremely popular among poorer voters, many of Roosevelt’s supporters were excluded from the Digest’s sample.

Another more complicated factor was the low response rate; only around 25% of the surveys were returned. This created what’s called a non-response bias.

Sampling and Gathering Data

The Digest's failure highlights the need for what is now considered the most important criterion for sampling: randomness. This randomness can be achieved in several ways. Here we cover some of the most common.

A simple random sample is chosen in a way that every unit in the population has an equal chance of being selected, and the chances of a unit being selected do not depend on the units already chosen. An example of this is choosing a group of people by drawing names out of a hat (assuming the names are well-mixed in the hat).

A systematic random sample is selected from an ordered list of the population (for example, names sorted alphabetically or students listed by student ID). First, we decide what proportion of the population will be in our sample. We want to express that proportion as a fraction with 1 in the numerator. Let’s call that number D. Next, we’ll choose a random number between one and D. The unit at that position will go into our sample. We’ll find the rest of our sample by choosing every Dth unit in the list, starting with our random number.

To walk through an example, let’s say we want to sample 2% of the population: $2 % = \frac{2}{100} = \frac{1}{50}$ . (Note: If the number in the denominator isn’t a whole number, we can just round it off. This part of the process doesn’t have to be precise.) We can then use a random number generator to find a random number between 1 and 50; let's use 31. In our example, our sample would then be the units in the list at positions 31, 81 (31 + 50), 131 (81 + 50), and so forth.

A stratified sample is one chosen so that particular groups in the population are certain to be represented. Let’s say you are studying the population of students in a large high school (where the grades run from 9th to 12th), and you want to choose a sample of 12 students. If you use a simple or systematic random sample, there’s a pretty good chance that you’ll miss one grade completely. In a stratified sample, you would first divide the population into groups (the strata), then take a random sample within each stratum (that’s the singular form of “strata”). In the high school example, we could divide the population into grades, then take a random sample of three students within each grade. That would get us to the 12 students we need while ensuring coverage of each grade.

A cluster sample is a sample where clusters of units are chosen at random, instead of choosing individual units. For example, if we need a sample of college students, we may take a list of all the course sections being offered at the college, choose three of them at random (the sections are the clusters), and then survey all the students in those sections. A sample like this one has the advantage of convenience: If the survey needs to be administered in person, many of your sample units will be located in one place at the same time.

Example $\PageIndex{1}$: Random Sampling

For each of the following situations, identify whether the sample is a simple random sample, a systematic random sample, a stratified random sample, a cluster random sample, or none of these.

A postal inspector wants to check on the performance of a new mail carrier, so she chooses four streets at random among those that the carrier serves. Each household on the selected streets receives a survey.
A hospital wants to survey past patients to see if they were satisfied with the care they received. The administrator sorts the patients into groups based on the department of the hospital where they were treated (ICU, pediatrics, or general), and selects patients at random from each of those groups.
A quality control engineer at a factory that makes smartphones wants to figure out the proportion of devices that are faulty before they are shipped out. The phones are currently packed in boxes for shipping, each of which holds 20 devices. The engineer wants to sample 100 phones, so he selects five crates at random and tests every phone in those five crates.
A newspaper reporter wants to write a story on public perceptions on a project that will widen a congested street. She stands on the side of the street in question and interviews the first five people she sees there.
An executive at a streaming video service wants to know if her subscribers would support a second season of a new show. She gets a list of all the subscribers who have watched at least one episode of the show, and uses a random number generator to select a sample of 50 people from the list.
An agent for a state’s Department of Revenue is in charge of selecting 100 tax returns for audit. He has a list of all of the returns eligible for audit (about 12,000 in all), sorted by the taxpayer’s ID number. He asks a computer to give him a random number between 1 and 120; it gives him 15. The agent chooses the 15th, 135th, 255th, 375th, and every 120th return after that to be audited.

Answer

To decide which type of random sample is being used in each of these, we need to focus on how the randomization is being incorporated.

The surveys are being given to households, so households are the units in this case. But households aren’t being chosen randomly; instead, streets are being chosen at random. These form clusters of units, so this is a cluster random sample.
In this case, the administrator isn’t selecting patients at random from the entire list of patients. Instead, she is choosing at random from the patients who were in each of the departments (ICU, pediatrics, general) separately. The departments form strata, so this is a stratified random sample.
The engineer is testing whether the phones are faulty, so those are the units. But the random process is being used to select the crates of phones. Those crates form clusters, so this is a cluster random sample.
The reporter isn’t using a random process at all, so this sample doesn’t belong to any of the types we have been talking about. A sample like this one is sometimes described as a convenience sample, and shouldn’t be used in a statistical setting.
The executive is choosing her sample completely at random from the full population, so this is a simple random sample.
The agent is choosing from the full population, but is only choosing the first unit for the sample at random; the rest are chosen by skipping down the list systematically. Thus, this is a systematic random sample.

Your Turn $\PageIndex{1}$

For each of the following situations, identify whether the sample is a simple random sample, a systematic random sample, a stratified random sample, a cluster random sample, or none of these.

The chairperson of the University Chess Club is trying to decide on a time for the club’s regular meetings, so she emails all of the members of the club to find their preferences.

The registrar at a small college wants to use a survey to determine if their office could do a better job of serving students. They choose three students at random from each major to take the survey.

A civic club is organizing a raffle as a fundraiser. To determine the three winners, each of the tickets is put into a large drum, then the tickets are thoroughly mixed. A blindfolded club member pulls three tickets out of the drum.

People in Mathematics: George Gallup

George Gallup gives a speech as he stands at a podium. — Figure $\PageIndex{2}$: George Gallup was a founder of survey sampling techniques, and his legacy lives on to this day. (credit: "George Gallup at the National Press Club, Washington, D.C., 1969" by Bernard Gotfryd/Library of Congress Prints & Photographs Division, public domain)

George Gallup (1901–1984) rose to fame in 1936 when his prediction of the percentage of the vote going to each candidate in that year’s U.S. Presidential election was more accurate than the one published in Literary Digest, and he did so using a sample that was much smaller than the Digest. He even took it one step farther, predicting with high accuracy the erroneous results of the poll that the Literary Digest would end up publishing! Gallup’s theories on public opinion polling essentially created that field. In 1948, Gallup’s reputation took a bit of a hit, when he famously, but incorrectly, predicted that Thomas Dewey would beat incumbent Harry Truman in that year’s Presidential election. Over the following decades, however, public trust in Gallup’s polls recovered and even steadily increased. The company Gallup founded continues to conduct daily public opinion polls, as well as provides consulting services for businesses.

Organizing Data

Once data have been collected, we turn our attention to analysis. Before we analyze, though, it’s useful to reorganize the data into a format that makes the analysis easier. For example, if our data were collected using a paper survey, our raw data are all broken down by respondent (represented by an individual response sheet). To perform an analysis on all the responses to an individual question, we need to first group all the responses to each question together. The way we organize the data depends on the type of data we’ve collected.

There are two broad types of data: categorical and quantitative. Categorical data classifies the unit into a group (or category). Examples of categorical data include a response to a yes-or-no question, or the color of a person’s eyes. Quantitative data is a numerical measure of a property of a unit. Examples of quantitative data include the time it takes for a rat to run through a maze or a person’s daily calorie intake. We’ll look at each type of data in turn when considering how best to organize.

Categorical Data Organization

The best way to organize categorical data is using a categorical frequency distribution. A categorical frequency distribution is a table with two columns. The first contains all the categories present in the data, each listed once. The second contains the frequencies of each category, which are just a count of how often each category appears in the data.

Example $\PageIndex{2}$: Creating a Categorical Frequency Distribution

A teacher records the responses of the class (28 students) on the first question of a multiple choice quiz, with five possible responses (A, B, C, D, and E):

Create a categorical frequency distribution that organizes the responses.

Answer

Step 1: For each possible response, count the number of times that response appears in the data. In the responses for this class, “A” appears 14 times, “B” 4 times, “C” 6 times, “D” 0 times, and “E” 4 times.

Step 2: Make a table with two columns. The first column should be labeled so that the reader knows what the responses mean, and the second should be labeled “Frequency.”

Response to First Question	Frequency
A	14
B	4
C	6
D	0
E	4

Step 3: Check your work. If you add up your frequencies, you should get the same number as the total number of responses. Twenty-eight students answered that first question, and $14 + 4 + 6 + 0 + 4 = 28$ .

Your Turn $\PageIndex{2}$

Students in a statistics class who were asked to provide their majors provided the data below:

Undecided	Biology	Biology	Sociology
Political Science	Sociology	Undecided	Undecided
Undecided	Biology	Biology	Education
Biology	Biology	Political Science	Political Science

Create a categorical frequency distribution to organize these responses.

Quantitative Data

We have a couple of options available for organizing quantitative data. If there are just a few possible responses, we can create a frequency distribution just like the ones we made for categorical data above. For example, if we’re surveying a group of high school students and we ask for each student’s age, we’ll likely only get whole-number responses between 13 and 19. Since there are only around seven (and likely fewer) possible responses, we can treat the data as if they’re categorical and create a frequency distribution as before.

Example $\PageIndex{3}$: Creating a Quantitative Frequency Distribution

Attendees of a conflict resolution workshop are asked how many siblings they have. The responses are as follows:

1	0	1	1	2	0	3	1	1	4	1	2	0	1	3
1	2	1	2	4	1	0	1	3	0	1	2	2	1	5

Create a frequency distribution to organize the responses.

Answer

Step 1: Count the number of times you see each unique response: “0” appears 5 times, “1” appears 13 times, “2” appears 6 times, “3” appears 3 times, “4” appears twice, and “5” appears once.

Step 2: Make a table with two columns. The first column should be labeled so that the reader knows what the responses mean, and the second should be labeled “Frequency.” Then fill in the results of our count.

Number of Siblings	Frequency	Number of Siblings	Frequency
0	5	3	3
1	13	4	2
2	6	5	1

Step 3: Check your work. If you add up your counts, you should get the same number as the total number of responses. Looking back at the raw data, there were 30 responses, and $5 + 13 + 6 + 3 + 2 + 1 = 30$ .

Your Turn $\PageIndex{3}$

A question on a community survey asked each respondent to give the number of people who shared their residence, and the data from the responses was as follows:

1	3	2	2	1	3	3	4	2	2	2	4	1	1	2	3	1	1	5	2
1	4	3	2	1	2	2	1	3	1	3	3	4	1	4	2	2	2	1	4

Create a frequency distribution to organize the responses.

If there are many possible responses, a frequency distribution table like the ones we’ve seen so far isn’t really useful; there will likely be many responses with a frequency of one, which means the table will be no better than looking at the raw data. In these cases, we can create a binned frequency distribution. A binned frequency distribution groups the data into ranges of values called bins, then records the number of responses in each bin.

For example, if we have height data for individuals measured in centimeters, we might create bins like 150–155 cm, 155–160 cm, and so forth (making sure that every data value falls into a bin). We must be careful, though; in this scenario, it’s not clear which bin would contain a response of 155 cm. Usually, responses on the edge of a bin are placed in the higher bin, but it’s good practice to make that clear. In cases where responses are rounded off, you can avoid this issue by leaving a gap between the bins that couldn’t contain any responses. In our example, if the measurements were all rounded off to the nearest centimeter, we could make bins like 150–154 cm, 155–159 cm, etc. (since a response like 154.2 isn’t possible). We’ll use this method going forward. How do we decide what the boundaries of our bins should be? There’s no one right way to do that, but there are some guidelines that can be helpful.

Every data value should fall into exactly one bin. For example, if the lowest value in our data is 42, the lowest bin should not be 45–49.
Every bin should have the same width. Note that if we shift the upper limits of our bins down a bit to avoid ambiguity (like described above), we can’t simply subtract the lower limit from the upper limit to get the bin width; instead, we subtract the lower limit of the bin from the lower limit of the next bin. For example, if we’re looking at GPAs rounded to the nearest hundredth, we might choose bins like 2.00–2.24, 2.25–2.49, 2.50–2.74, etc. These bins all have a width of 0.25.
If the minimum or maximum value of the data falls right on the boundary between two bins, then it’s OK to bend the rule just a little in order to avoid having an additional bin containing just that one value. We’ll see an example of this in just a moment.
If we have too many or too few bins, it can be difficult to get a good sense of the distribution. Seven or eight bins is ideal, but that’s not a firm rule; anything between five and twelve is fine. We often choose the number of bins so that the widths are round numbers.

Example $\PageIndex{4}$: Creating a Binned Frequency Distribution

The GPAs of students enrolled in an advanced sociology class are listed in the following table. At this institution, 4.00 is the maximum possible GPA.

3.93	3.43	2.87	2.51	2.70	1.91	2.32	2.85	3.06	3.03	3.49	1.84	3.72	2.56
1.99	3.40	3.74	3.23	1.98	3.05	1.43	2.90	1.20	3.72	3.56	3.07	2.58	4.00
2.79	3.81	2.60	3.69	2.88	3.34	1.51	3.63	3.45	1.89	2.30	2.98	3.04	2.70

Create a binned frequency distribution for the data.

Answer

Step 1: Identify the max and min values in your bins. Looking at the dataset, you can see that the lowest value is 1.20, and the highest is 4.00.

Step 2: Get a rough idea of bin widths. Aim for seven or eight bins, give or take a couple. For eight bins, the minimum width can be found by taking the difference between the largest and smallest data values and dividing by the number of bins:

$\frac{maximum - minimum}{# of bins} = \frac{4.00 - 1.20}{8} = 0.35 .$

If we use 0.35 for our widths, starting at our minimum value of 1.20, we’ll get bins with these boundaries: 1.20, 1.55, 1.90, 2.25, 2.60, 2.95, 3.30, 3.65, 4.00.

Step 3: Consider the context of the values. Because these are GPAs, there are natural breaks at 2.00 and 3.00 that are important. (People like whole numbers!) Since 0.35 is very close to $\frac{1}{3}$ , let’s use that for our bin width instead, and make sure that whole numbers fall on the boundaries. That means our first bin needs to start at 1.00 and go up to 1.33 to make sure our minimum value is included. The next bin will run from 1.34 to 1.66, and so forth.

Step 4: Create the distribution table. We start our distribution table by filling in the bins:

GPA Range	GPA Range	GPA Range
1.00–1.33	2.00–2.33	3.00–3.33
1.34–1.66	2.34–2.66	3.34–3.66
1.67–1.99	2.67–2.99	3.67–4.00

Notice that the last bin doesn’t follow the pattern; since our maximum data value is right on the upper boundary of that last bin, this is a case where we can bend that rule just a little to avoid creating a bin for 4.00–4.33 (which wouldn’t really make sense in the context of these GPAs anyway, since 4.00 is the maximum possible GPA).

Step 5: Complete the table with the frequencies. Finish the table by counting the number of data values that fall in each bin, and recording them in the frequency column:

GPA Range	Frequency	GPA Range	Frequency	GPA Range	Frequency
1.00–1.33	1	2.00–2.33	2	3.00–3.33	6
1.34–1.66	2	2.34–2.66	4	3.34–3.66	7
1.67–1.99	5	2.67–2.99	8	3.67–4.00	7

Step 6: Check your work. Add up the frequencies to make sure all the data values are included. We started with forty-two data values, and $1 + 2 + 5 + 2 + 4 + 8 + 6 + 7 + 7 = 42$ .

Your Turn $\PageIndex{4}$

The following table displays the ages of a sample of customers who have shopped at a new boutique.

56	39	35	32	26	53	55	47	70	43
33	33	43	41	26	40	31	34	33	53

Create a binned frequency distribution to summarize these data.

Check Your Understanding

For the following problems, decide whether randomization is being used in the selection of these samples. If it is, identify the type of random sample (simple, systematic, cluster, or stratified).

High school guidance counselors want to know the proportion of the school’s seniors who intend to apply for college. They choose four senior homerooms at random, then visit each one and ask every student in those homerooms whether they intend to apply.
A quality control technician wants to ensure that the sandals being made in his factory are up to specifications, so they check the first five pairs they see coming off the line.
A college athletic department wants to check up on the mental wellness of its student-athletes. The department wants to ensure every varsity sport is represented, so they survey three randomly selected members of each team.
The purchasing manager for a chain of bookstores wants to make sure they’re buying the right types of books to put on the shelves, so they take a sample of 20 books that customers bought in the last five days and record the genres. Use the raw data below to create a categorical frequency distribution.

Nonfiction	Young Adult	Romance	Cooking	Young Adult
Young Adult	Thriller	Young Adult	Nonfiction	True Crime
Romance	Nonfiction	Thriller	True Crime	Romance
True Crime	Thriller	Romance	Young Adult	Young Adult

5. A survey of college students asked how many courses those students were currently taking. Create a quantitative frequency distribution to summarize the raw data given below:

3	4	4	3	5	4	4	3	2	3	5	5	3	3	4	3
2	4	3	3	4	3	5	3	3	3	2	3	1	3	4	3

6. The World Bank provides data on every country in the world. The following is a sample of twenty-five countries, along with the number of cell phone subscriptions registered in that country per hundred residents. Create a binned frequency distribution for the cell phone data.

Country	Cell	Country	Cell
Cameroon	83.7	Benin	78.5
Vanuatu	82.5	Eritrea	13.7
Georgia	140.7	Mauritania	92.2
Kazakhstan	146.6	Czech Republic	119
Bermuda	105.9	Qatar	151.1
Russia	157.9	Pakistan	73.4
Hungary	113.5	Egypt	105.5
Costa Rica	180.2	Nepal	123.2
Algeria	111	Turkey	96.4
Somalia	48.3	Congo	43.5
Fiji	114.2	Venezuela	78.5
El Salvador	156.5	Germany	133.6
Angola	44.7

(source)

Search

Text Color

Text Size

Margin Size

Font Type

Example \(\PageIndex{1}\): Random Sampling

Your Turn \(\PageIndex{1}\)

People in Mathematics: George Gallup

Example \(\PageIndex{2}\): Creating a Categorical Frequency Distribution

Your Turn \(\PageIndex{2}\)

Example \(\PageIndex{3}\): Creating a Quantitative Frequency Distribution

Your Turn \(\PageIndex{3}\)

Example \(\PageIndex{4}\): Creating a Binned Frequency Distribution

Your Turn \(\PageIndex{4}\)