4.2: Frequency Distributions and Statistical Graphs
Once we have collected data, then we need to start analyzing the data. One way to display and summarize data is to use statistical graphing techniques. The type of graph we use depends on the type of data collected. Qualitative data use graphs like bar graphs and pie graphs. Quantitative data use graphs such as histograms and frequency polygons.
In order to create graphs, we must first organize and create a summary of the individual data values in the form of a frequency distribution . A frequency distribution is a listing all of the data values (or groups of data values) and how often those data values occur.
Frequency is the number of times a data value or groups of data values (called classes ) occur in a data set.
A frequency distribution is a listing of each data value or class of data values along with their frequencies.
Relative frequency is the frequency divided by \(n\), the size of the sample. This gives the proportion of the entire data set represented by each value or class. Relative frequencies are expressed as fractions, decimals, or percentages.
A relative frequency distribution is a listing of each data value or class of data values along with their relative frequencies.
The method of creating a frequency distribution depends on whether we are working with qualitative data or quantitative data . We will now look at how to create each type of frequency distribution according to the type of data and the graphs that go with them.
Organizing Qualitative Data
Qualitative data are pieces of information that allow us to classify the items under investigation into various categories. We usually begin working with qualitative data by giving the frequency distribution as a frequency table.
A frequency table is a table with two columns. One column lists the categories, and another column gives the frequencies with which the items in the categories occur (how many data fit into each category).
An insurance company determines vehicle insurance premiums based on known risk factors. If a person is considered a higher risk, their premiums will be higher. One potential factor is the color of your car. The insurance company believes that people with some colors of cars are more likely to ve involved in accidents. To research this, the insurance company examines police reports for recent total-loss collisions. The data is summarized in the frequency table below.
\(\begin{array}{|c|c|}
\hline \textbf { Color } & \textbf { Frequency } \\
\hline \text { Blue } & 25 \\
\hline \text { Green } & 52 \\
\hline \text { Red } & 41 \\
\hline \text { White } & 36 \\
\hline \text { Black } & 39 \\
\hline \text { Grey } & 23 \\
\hline
\end{array}\)
Graphing Qualitative Data in Bar Graphs and Pie Charts
Once we have organized and summarized qualitative data into a frequency table, we are ready to graph the data. There are various ways to visualize qualitative data. In this section we will consider two common graphs: bar graphs and pie graphs .
A bar graph is displays a bar for each category. The length of each bar indicates the frequency of that category.
To construct a bar graph, we need to draw a vertical axis and a horizontal axis. The vertical direction has a scale and measures the frequency of each category. The horizontal axis has no scale in this instance but lists the categories. The construction of a bar chart is most easily described by use of an example.
Using the car color data from Example 1, note the highest frequency was 52, so the vertical axis needs to go from 0 to 52. We might as well use 0 to 55 so that we can put a hash mark every 5 units:
You should notice a few things about the correct construction of this bar graph.
- The height of each bar is determined by the frequency of the corresponding color.
- Both axes are labeled clearly.
- The bars do not touch and they are the same width.
The horizontal grid lines are a nice touch, but not necessary. In practice, you will find it useful to draw bar graphs on graph paper so the grid lines will already be in place or use technology to create the graph. Instead of grid lines, we might also list the frequencies at the top of each bar, like this:
In a survey, adults were asked whether they personally worried about a variety of environmental concerns. The numbers (out of 1012 surveyed) who indicated that they worried “a great deal” about some selected concerns are summarized below.
\(\begin{array}{|c|c|}
\hline \textbf { Environmental Issue } & \textbf { Frequency } \\
\hline \text { Pollution of drinking water } & 597 \\
\hline \text { Contamination of soil and water by toxic waste } & 526 \\
\hline \text { Air pollution } & 455 \\
\hline \text { Global warming } & 354 \\
\hline
\end{array}\)
Display the data using a bar graph.
Solution
A class was asked for their favorite soft drink with the following results:
| Coke | Pepsi | Mt. Dew | Coke | Pepsi | Dr. Pepper | Sprite | Coke | Mt. Dew |
|---|---|---|---|---|---|---|---|---|
| Pepsi | Pepsi | Dr. Pepper | Coke | Sprite | Mt. Dew | Pepsi | Dr. Pepper | Coke |
| Pepsi | Mt. Dew | Coke | Pepsi | Pepsi | Dr. Pepper | Sprite | Pepsi | Coke |
| Dr. Pepper | Mt. Dew | Sprite | Coke | Coke | Pepsi |
- Create a frequency distribution table for the data.
- Create a relative frequency distribution table for the data.
- Draw a bar graph of the frequency distribution.
- Draw a bar graph of the relative frequency distribution.
Solution
- To make a frequency distribution table, list each drink type and and then count how often each drink occurs in the data. Notice that Coke happens 9 times in the data set, Pepsi happens 10 times, and so on.
| Drink | Coke | Pepsi | Mt Dew | Dr. Pepper | Sprite |
|---|---|---|---|---|---|
| Frequency | 9 | 10 | 5 | 5 | 4 |
- To make a relative frequency distribution table, use the previous results and divide each frequency by 33, which is the total number of data responses.
| Drink | Coke | Pepsi | Mt Dew | Dr. Pepper | Sprite |
|---|---|---|---|---|---|
| Frequency | 9 | 10 | 5 | 5 | 4 |
| Relative Frequency | \(\frac{9}{33} \approx 0.273 \text{ or } 27.3\%\) | \(\frac{10}{33}\approx 0.303\text{ or } 30.3\%\) | \(\frac{5}{33} \approx 0.152\text{ or } 15.2\%\) | \(\frac{5}{33} \approx 0.152\text{ or } 15.2\%\) | \(\frac{4}{33} \approx 0.121\text{ or } 12.1\%\) |
- Along the horizontal axis you place the drinks. Space these apart equally, and allow space to draw bars above the axis. The vertical axis shows the frequencies. Make sure you create a scale along that axis in which all of the frequencies will fit. Notice that the highest frequency is 10, so you want to make sure the vertical axis goes to at least 10, and you may want to count by two for every tick mark. Here is what the graph looks like using Excel.
- A bar graph for the relative frequency distribution is similar to the bar graph for the frequency distribution except that the relative frequencies are used along the vertical axis instead. Notice that the graph does not actually change except the numbers on the vertical scale.
Let's use the last example to introduce another way of visualizing data – a pie chart also known as circle graph .
A pie chart is a graph where the "pie" represents the entire sample and the "slices" represent the categories or classes. The size of the slice of the pie corresponds to the relative frequency for that category.
To find the angle that each “slice” takes up, multiply the relative frequency of that slice by 360°.
Note: Theoretically, the percentages of all slices of a pie chart must add to 100%. In practice, the percentages may add to be slightly more or less than 100% if percentages are rounded.
To draw a pie chart, multiply the relative frequencies of each drink by 360°. Then, use a protractor to mark off the corresponding angle in a circle. Usually it is easier to use Excel or some other spreadsheet program to draw the graph.
| Drink | Coke | Pepsi | Mt Dew | Dr. Pepper | Sprite |
|---|---|---|---|---|---|
| Frequency | 9 | 10 | 5 | 5 | 4 |
| Relative Frequency | \(\frac{9}{33} \approx 0.273 \text{ or } 27.3\%\) | \(\frac{10}{33}\approx 0.303\text{ or } 30.3\%\) | \(\frac{5}{33} \approx 0.152\text{ or } 15.2\%\) | \(\frac{5}{33} \approx 0.152\text{ or } 15.2\%\) | \(\frac{4}{33} \approx 0.121\text{ or } 12.1\%\) |
| Angle Measures | \(\frac{9}{33} \times 360^o \approx 98.2^o \) | \(\frac{10}{33} \times 360^o \approx 109.1^o \) | \(\frac{5}{33} \times 360^o \approx 54.5^o \) | \(\frac{5}{33} \times 360^o \approx 54.5^o \) | \(\frac{4}{33} \times 360^o \approx 43.6^o \) |
The pie graph from Excel is shown below.
The Red Cross Blood Donor Clinic had a very successful morning collecting blood donations. Within 3 hours, many people had made donations. The table shows the frequency distribution of the blood types of the donations. Construct a pie chart to display the relative frequency distribution.
| Blood Type | A | B | O | AB |
|---|---|---|---|---|
| Number of Donors | 7 | 5 | 9 | 4 |
Organizing Quantitative Data
Quantitative is data that is the result of counting or measuring some aspect of items under investigation. For this reason, this type of data is also known as numerical data. Quantitative data can also be summarized in a table to show its frequency distribution.
A teacher records scores on a 20-point quiz for the 30 students in his class. The scores are
19 20 18 18 17 18 19 17 20 18 20 16 20 15 17 12 18 19 18 19 17 20 18 16 15 18 20 5 0 0
These scores could be summarized into a frequency table by counting how many times each particular data value occurs.
\(\begin{array}{|c|c|}
\hline \textbf { Score } & \textbf { Frequency } \\
\hline 0 & 2 \\
\hline 5 & 1 \\
\hline 12 & 1 \\
\hline 15 & 2 \\
\hline 16 & 2 \\
\hline 17 & 4 \\
\hline 18 & 8 \\
\hline 19 & 4 \\
\hline 20 & 6 \\
\hline
\end{array}\)
In the previous example, the table listed every different data value that occurred and how often each value occurred. We call this type of frequency distribution presentation ungrouped . Sometimes it is helpful to group the data into classes to observe information about the distribution of data that otherwise wouldn't be noticeable. This is particularly true if there are many different values or each value only occurs once. You can think about classes as "bins" that we create to sort the data. When we group the data into classes, we call this type of frequency distribution presentation grouped .
When data are grouped, the following guidelines about the classes should be followed
- Classes should have the same width.
- Classes should not overlap.
- Each piece of data should belong to only one class.
Let's use the data from the previous example to create a grouped frequency distribution.
A teacher records scores on a 20-point quiz for the 30 students in his class. The scores are
19 20 18 18 17 18 19 17 20 18 20 16 20 15 17 12 18 19 18 19 17 20 18 16 15 18 20 5 0 0
Create a grouped frequency table in two ways:
- with classes of width 5 beginning at a score of 0, and
- with classes of width 6 beginning at a score of 0.
Solution
- The first class contains the scores 0, 1, 2, 3, and 4 -- if any occur. Likewise, the second class will contain scores 5, 6, 7, 8, and 9 -- if any occur. This pattern continues until classes are no longer needed.
The first two columns of the table shows the classes and the frequency of the data in each class.
| Class | Frequency | Class Mark (Midpoint) |
|---|---|---|
| 0-4 | 2 | \(\frac{0+4}{2}=2\) |
| 5-9 | 1 | \(\frac{5+9}{2}=7\) |
| 10-14 | 1 | \(\frac{10+14}{2}=12\) |
| 15-19 | 20 | \(\frac{15+19}{2}=17\) |
| 20-24 | 6 | \(\frac{20+24}{2}=22\) |
In the first column, the numbers 0, 4, 10, 15, and 20 are called the lower class limits and the numbers 4, 9, 14, 19, and 24 are the upper class limits. You can see these limits increase by 5. The class width can be determined as the difference between any two consecutive lower or upper class limits. The class mark is the midpoint of the class and is determined by averaging the lower and upper limits of the class. The class marks are shown in the third column of the table.
The modal class of a frequency distribution is the class with the highest frequency. Here the modal class is 15-19 with a frequency of 20 students. This grouping of the data allows us to more clearly see the grade distribution. Always be sure that the sum of the frequencies is the number of data values.
-
Here is another grouping for the same data but with a class width of 6.
Frequency Table for Quiz Scores Class Frequency Class Mark (Midpoint) 0-5 3 \(\frac{0+5}{2}=2.5\) 6-11 0 \(\frac{6+11}{2}=8.5\) 12-17 9 \(\frac{12+17}{2}=14.5\) 18-23 18 \(\frac{18+23}{2}=20.5\) When the data are grouped using this structure, the modal class is 18–23.
There is a "sort feature" on the TI calculator that sorts data in ascending or descending order for you. This makes organizing data and counting frequencies much easier. The steps for entering data and sorting it is shown here for the data presented in Try it 3 .
Let's consider the reverse situation when we have a frequency table with grouped data and determine information about the original data. This scenario is important because you will often see grouped data due to data storage capacities.
Answer the questions using the frequency table.
- What is the total number of data values in this data distribution?
Adding the frequencies of each class, we have \(4 + 7 + 1 + 0 +3 +5 = 20\).
- What class width is used to group the data?
Subtract any two consecutive lower class limits or any two consecutive upper class limits. For example, \(16 – 9 = 7\).
- What is the class mark of the second class ?
The class mark is the midpoint of the class. Average the lower and upper class limit: \(\frac{16+22}{2} = 19\).
- What is the modal class?
The class with the highest frequency is 16-22.
- If an additional class were added to the end of the table, what would be the upper and lower class limits?
Add the class width 7 to the last lower and upper class limits to get 51-57.
Graphing Quantitative Data in Histograms and Frequency Polygons
A histogram is a statistical graph commonly used to visualize frequency distributions of quantitative data. A histogram is like a bar graph, but where the horizontal axis is a number line.
A histogram is a graph with observed values or classes of values along the horizontal axis and frequencies along the vertical axis. A bar with a height equal to the frequency (or relative frequency) is built above each observed value or class.
In a histogram, classes may be identified by their class marks (midpoints of the classes) or by their class limits. The horizontal scale may or may not begin at 0, and but the vertical scale should always start at zero. The bars generally touch in a histogram - unless the frequency is 0 for a particular data value or class of values.
Let's illustrate how a histogram is constructed with the following example.
Each member of a class is asked how many plastic beverage bottles they use and discard in a week. Suppose the following (hypothetical) data are collected.
First, we organize the data by grouping it and presenting it in a frequency table. The classes have width 2 and begin at 1.
| Number of Bottles Used | Frequencies | Class Marks (Midpoints) |
|---|---|---|
| 1-2 | 2 | 1.5 |
| 3-4 | 7 | 3.5 |
| 5-6 | 14 | 5.5 |
| 7-8 | 9 | 7.5 |
Next, we draw a bar for each class so that its height represents the frequency of students using those numbers of bottles. We label the midpoints of each bar with the class marks along the horizontal axis.
Graphing data can get tedious and complicated, especially if there are lots of data to organize. Excel and other software can easily make graphs. So can a TI graphing calculator. The steps to creating a histogram for these data is given below.
Suppose that we have collected weights from 100 male subjects as part of a nutrition study. For our weight data, we have values ranging from a low of 121 pounds to a high of 263 pounds, giving a total span of 263-121 = 142. We could create 7 intervals with a width of around 20, 14 intervals with a width of around 10, or somewhere in between. Often time we have to experiment with a few possibilities to find something that represents the data well. Let us try using an interval width of 15. We could start at 121, or at 120 since it is a nice round number.
\(\begin{array}{|c|c|}
\hline \textbf { Interval } & \textbf { Frequency } \\
\hline 120-134 & 4 \\
\hline 135-149 & 14 \\
\hline 150-164 & 16 \\
\hline 165-179 & 28 \\
\hline 180-194 & 12 \\
\hline 195-209 & 8 \\
\hline 210-224 & 7 \\
\hline 225-239 & 6 \\
\hline 240-254 & 2 \\
\hline 255-269 & 3 \\
\hline
\end{array}\)
A histogram of this data would look like
You can see the modal class is 165-179. You can also conclude there is a higher frequency of males in the lower part of the distribution of weights because the bars are taller there.
Another way to visualize frequency distribution data is to construct a frequency polygon .
An alternative representation of a histogram is a frequency polygon . A frequency polygon starts like a histogram, but instead of drawing a bar, a point is placed at the midpoint of each interval at a height equal to the frequency. Typically, the points are connected with straight lines to emphasize the shape of the data distribution.
The following example illustrates the relationship between a histogram and a frequency polygon for the same data.
Ms. Winter made a histogram and frequency polygon of the science test scores from 5 th period.
From either the histogram or the frequency polygon, we can see the class width is 10 points. We can also see that the modal class is 80-89. Finally, you can conclude that there is a larger frequency of students who scored high on the test than low on the test because the bars of the histogram and peak on the frequency polygon are taller on the right side of the horizontal axis.
The data below came from a task in which the goal is to move a computer mouse to a target on the screen as fast as possible. On 20 of the trials, the target was a small rectangle; on the other 20, the target was a large rectangle. Time to reach the target was recorded on each trial.
\(\begin{array}{|c|c|c|}
\hline \begin{array}{c}
\textbf { Interval } \\
\textbf { (milliseconds) }
\end{array} & \begin{array}{c}
\textbf { Frequency } \\
\textbf { small target }
\end{array} & \begin{array}{c}
\textbf { Frequency } \\
\textbf { large target }
\end{array} \\
\hline 300-399 & 0 & 0 \\
\hline 400-499 & 1 & 5 \\
\hline 500-599 & 3 & 10 \\
\hline 600-699 & 6 & 5 \\
\hline 700-799 & 5 & 0 \\
\hline 800-899 & 4 & 0 \\
\hline 900-999 & 0 & 0 \\
\hline 1000-1099 & 1 & 0 \\
\hline 1100-1199 & 0 & 0 \\
\hline
\end{array}\)
One option to represent this data would be a comparative histogram or bar chart, in which bars for the small target group and large target group are placed next to each other.
A pair of frequency polygons in the same graph for the same two sets of data makes it easier to see that reaction times were generally shorter for the larger target, and that the reaction times for the smaller target were more spread out.
In the next section, we will begin to analyze and describe data distributions numerically rather than graphically.