5.7: Presenting Data Graphically

Last updated
Save as PDF

Page ID: 74330

Leah Griffith, Veronica Holbrook, Johnny Johnson & Nancy Garcia
Rio Hondo College

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\dsum}{\displaystyle\sum\limits} $

$ \newcommand{\dint}{\displaystyle\int\limits} $

$ \newcommand{\dlim}{\displaystyle\lim\limits} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$\newcommand{\longvect}{\overrightarrow}$

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

5.7 Learning Objectives

Create a frequency table from a data set
Create bar graphs, pareto charts, pie charts, histograms and frequency polygons from data sets
Describe the shape of a histogram

Categorical, or qualitative, data are pieces of information that allow us to classify the objects under investigation into various categories. We usually begin working with categorical data by summarizing the data into a frequency table.

Definition: Frequency Table

A frequency table is a table with two columns. One column lists the categories, and another for the frequencies with which the items in the categories occur (how many items fit into each category).

Example 1

An insurance company determines vehicle insurance premiums based on known risk factors. If a person is considered a higher risk, their premiums will be higher. One potential factor is the color of your car. The insurance company believes that people with some color cars are more likely to get in accidents. To research this, they examine police reports for recent total-loss collisions. The data is summarized in the frequency table below.

$\begin{array}{|l|l|}
\hline \textbf { Color } & \textbf { Frequency } \\
\hline \text { Blue } & 25 \\
\hline \text { Green } & 52 \\
\hline \text { Red } & 41 \\
\hline \text { White } & 36 \\
\hline \text { Black } & 39 \\
\hline \text { Grey } & 23 \\
\hline
\end{array}$

Sometimes we need an even more intuitive way of displaying data. This is where charts and graphs come in. There are many, many ways of displaying data graphically, but we will concentrate on one very useful type of graph called a bar graph. In this section we will work with bar graphs that display categorical data; the next section will be devoted to bar graphs that display quantitative data.

Definition: Bar Graph

A bar graph is a graph that displays a bar for each category with the length of each bar indicating the frequency of that category.

To construct a bar graph, we need to draw a vertical axis and a horizontal axis. The vertical direction will have a scale and measure the frequency of each category; the horizontal axis has no scale in this instance. The construction of a bar chart is most easily described by use of an example.

Example 2

Using our car data from above, note the highest frequency is 52, so our vertical axis needs to go from 0 to 52, but we might as well use 0 to 55, so that we can put a hash mark every 5 units:

$this is a bar graph the vertical axis has a frequency each tick mark of the frequency goes up by five units. So, you'll see the units five all the way through 55 going up on the horizontal axis you have colors blue, green, red, white, black, and gray. The horizontal axis describes vehicle color involved in total loss collision. The height of blue goes to 25. The height of green is a little bit past 50. Red goes a little bit over 40. White goes a little bit over 35. Black goes right under 40. Gray goes right under 25.$

Notice that the height of each bar is determined by the frequency of the corresponding color. The horizontal gridlines are a nice touch, but not necessary. In practice, you will find it useful to draw bar graphs using graph paper, so the gridlines will already be in place, or using technology. Instead of gridlines, we might also list the frequencies at the top of each bar, like this:

$Blue has 25. Green has 52. Red has 41. White has 36. Black has 39. Gray has 23.$

In this case, our chart might benefit from being reordered from largest to smallest frequency values. This arrangement can make it easier to compare similar values in the chart, even without gridlines. When we arrange the categories in decreasing frequency order like this, it is called a Pareto chart.

Definition: Pareto Chart

A Pareto chart is a bar graph ordered from highest to lowest frequency.

Example 3

Transforming our bar graph from earlier into a Pareto chart, we get:

$The frequency still moves up the same by five units. And the horizontal axis has the colors going in this order. Green, red, black, white, blue, and gray. The values haven't changed. Green has 52 red has 41 Black has 39 White has 36 Blue has 25 and Gray has 23.$

Example 4

In a survey[1], adults were asked whether they personally worried about a variety of environmental concerns. The numbers (out of 1012 surveyed) who indicated that they worried “a great deal” about some selected concerns are summarized below.

$\begin{array}{|l|l|}
\hline \textbf { Environmental Issue } & \textbf { Frequency } \\
\hline \text { Pollution of drinking water } & 597 \\
\hline \text { Contamination of soil and water by toxic waste } & 526 \\
\hline \text { Air pollution } & 455 \\
\hline \text { Global warming } & 354 \\
\hline
\end{array}$

Solution

This data could be shown graphically in a bar graph:

$The vertical axis has the frequency each frequency moves up 100. Each tick mark is 100 and it goes all the way up to 600. The four categories are water pollution toxic waste, air pollution, and global warming. Water pollution goes up to approximately 600 toxic waste goes up a little bit above 500. Air pollution is about halfway between 400 and 500 and global warming is about halfway between 300 and 400 and the horizontal axis is the environmental worries.$

To show relative sizes, it is common to use a pie chart.

Definition: Pie Chart

A pie chart is a circle with wedges cut of varying sizes marked out like slices of pie or pizza. The relative sizes of the wedges correspond to the relative frequencies of the categories.

Example 5

For our vehicle color data, a pie chart might look like this:

$pie chart where you are being asked to put the corresponding frequencies in them. The colors are green, red, black, white, blue, and gray.$

Pie charts can often benefit from including frequencies or relative frequencies (percents) in the chart next to the pie slices. Often having the category names next to the pie slices also makes the chart clearer.

$a pie chart where you are being asked to put the corresponding frequencies in them. The colors are green, red, black, white, blue, and gray. Those are all in the legend the biggest piece have the pie chart is the green piece at 52%. The next biggest is the red piece at 41%. The next largest piece is a black piece at 39%. The next largest piece is 36%. That's the white piece. The next piece down it's the blue piece 25% And the smallest piece is the gray piece, which is 23% an$

Example 6

$pie chart that describes the voter preferences. This pie chart has the largest piece the black piece at 46%. That represents Ellison. Then next largest piece is the green piece that represents 43%. That's Douglas and the smallest piece is the red piece. And that 11% which is Reeves.$ The pie chart to the right shows the percentage of voters supporting each candidate running for a local senate seat.

If there are 20,000 voters in the district, the pie chart shows that about 11% of those, about 2,200 voters, support Reeves.

Pie charts look nice, but are harder to draw by hand than bar charts since to draw them accurately we would need to compute the angle each wedge cuts out of the circle, then measure the angle with a protractor. Computers are much better suited to drawing pie charts. Common software programs like Microsoft Word or Excel, OpenOffice.org Write or Calc, or Google Docs are able to create bar graphs, pie charts, and other graph types. There are also numerous online tools that can create graphs[2].

Try it Now 1

Create a bar graph and a pie chart to illustrate the grades on a history exam below.

A: 12 students, B: 19 students, C: 14 students, D: 4 students, F: 5 students

Answer: $Bar graph showing the frequency of the letter grades: A: 12 students, B: 19 students, C: 14 students, D: 4 students, F: 5 students$ $Pie chart showing the relative frequency of the letter grades: A: 12 students, B: 19 students, C: 14 students, D: 4 students, F: 5 students$

Quantitative, or numerical, data can also be summarized into frequency tables.

Example 7

A teacher records scores on a 20-point quiz for the 30 students in his class. The scores are:

19 20 18 18 17 18 19 17 20 18 20 16 20 15 17 12 18 19 18 19 17 20 18 16 15 18 20 5 0 0

These scores could be summarized into a frequency table by grouping like values:

$\begin{array}{|c|c|}
\hline \textbf { Score } & \textbf { Frequency } \\
\hline 0 & 2 \\
\hline 5 & 1 \\
\hline 12 & 1 \\
\hline 15 & 2 \\
\hline 16 & 2 \\
\hline 17 & 4 \\
\hline 18 & 8 \\
\hline 19 & 4 \\
\hline 20 & 6 \\
\hline
\end{array}$

Using this table, it would be possible to create a standard bar chart from this summary, like we did for categorical data:

$A bar chart where the vertical access is a frequency, and it has tick marks that go from zero to eight. The bar chart has a horizontal axis which is the score and the scores in the horizontal axis are 0 5 12 15 16 17 18 19 and 20. A score of zero has a height of two. A score of five has a height of one. A score of 12 has a height of one. A score of 15 has a height of two. A score of 16 has a height of two. A score of 17 has a height of four. A score of 18 has a height of eight. A score of 19 has a height of four. A score of 20 has a height of six.$

However, since the scores are numerical values, this chart doesn’t really make sense; the first and second bars are five values apart, while the later bars are only one value apart. It would be more correct to treat the horizontal axis as a number line. This type of graph is called a histogram.

Definition: Histogram

A histogram is like a bar graph, but where the horizontal axis is a number line.

Example 8

For the values above, a histogram would look like:

$its a histogram of the same values as the above bar chart. From zero to nine in the histogram each bar each tick mark in the histogram is one unit away from each other. In the histogram the bars touch each other. So, the first bar is from zero to one has a height of two. The second bar from five to six has a height of one. 12 to 13 has a height of one. 15 to 16 has a height of two. 16 to 17 has a height of two. 17 to 18 has a height of four. 18 to 19 that has a height of eight. 19 to 20 has a height of four. 20 to 21 that has a height of six.$

Notice that in the histogram, a bar represents values on the horizontal axis from that on the left hand-side of the bar up to, but not including, the value on the right hand side of the bar. Some people choose to have bars start at ½ values to avoid this ambiguity.

$The frequency histogram is labeled differently below in this one they only label the horizontal axis by twos. So it goes zero then the next number labeled is two then four then six, then eight, then 10 Then 12 Then 14 Then 16 and 18 than 20.$

Unfortunately, not a lot of common software packages can correctly graph a histogram. About the best you can do in Excel or Word is a bar graph with no gap between the bars and spacing added to simulate a numerical horizontal axis.

If we have a large number of widely varying data values, creating a frequency table that lists every possible value as a category would lead to an exceptionally long frequency table, and probably would not reveal any patterns. For this reason, it is common with quantitative data to group data into class intervals or bins.

Definition: Class Intervals

Class intervals are groupings of the data. In general, we define class intervals so that:

Each interval is equal in size. For example, if the first class contains values from 120-129, the second class should include values from 130-139.
We have somewhere between 5 and 20 classes, typically, depending upon the number of data we’re working with.
The class interval width is the lowest value in one class minus the lowest value in the previous class.

Example 9

Suppose that we have collected weights from 100 male subjects as part of a nutrition study. For our weight data, we have values ranging from a low of 121 pounds to a high of 263 pounds, giving a range of 263-121 = 142. We could create 7 intervals with a width of around 20, 14 intervals with a width of around 10, or somewhere in between. Often time we have to experiment with a few possibilities to find something that represents the data well.

Let us try using an interval width of 15. We could start at 121, or at 120 since it is a nice round number. With a class width of 15, the next class begins at 120 + 15 or 135, the following one at 135 + 15 or 150, and so on until there are classes for all the data.

$\begin{array}{|c|c|}
\hline \textbf { Interval } & \textbf { Frequency } \\
\hline 120-134 & 4 \\
\hline 135-149 & 14 \\
\hline 150-164 & 16 \\
\hline 165-179 & 28 \\
\hline 180-194 & 12 \\
\hline 195-209 & 8 \\
\hline 210-224 & 7 \\
\hline 225-239 & 6 \\
\hline 240-254 & 2 \\
\hline 255-269 & 3 \\
\hline
\end{array}$

A histogram of this data would look like:

$The histogram has the left vertical axis as a frequency and the horizontal axis as weight in pounds. The frequency has tick marks going up by five counting by five all the way up to 30. So 5 10 15 20 25 30. The horizontal axis starts at 120 and goes by fifteens. So, 120 to 135, 135 to 150 ,150 to 165, 165 to 180, 180 to 195, 195 to 210, 210 to 225, 225 to 240, 240 to 255 and 255 to 270. The height of the interval 120 to 135 is about 4, the height from 135 to 150 is about 14 ,the height from 150 to 165 is about 16, the height from 165 to 180 is about 28, the height from 180 to 195 is about 12, the height from 195 to 210 is about eight, the height from 210 to 225 is about seven ,the height from 225 to 240 is about six, the height from 240 to 255 is about two ,the height from 255 to 270 is about three.$

In many software packages, you can create a graph similar to a histogram by putting the class intervals as the labels on a bar chart.

$the entire interval they put the entire interval in the vertical axis so it'll actually show 120 To 134, 135 to 149 ,150 to 164 ,165 to 179 ,180 to 194 ,195 to 209, 210 to 224, 225 to 239, 240 to 254 , 255 to 269.$

Histograms also display something very important called shape. In this particular example above, you can see a majority of the data is the the left, and it gets smaller and smaller the more you go to the right. We consider this the tail-end to the right and therefore call this skewed to the right. If a majority of the data was on the right and its tail-end was to the left, we would call this skewed to the left. If there was approximately equally distributed data to both the right and left, we would call this symmetric. If all rectangles in a histogram had the same height, we would call this uniform.

Other graph types such as pie charts are possible for quantitative data. The usefulness of different graph types will vary depending upon the number of intervals and the type of data being represented. For example, a pie chart of our weight data is difficult to read because of the quantity of intervals we used.

$Pie chart below that displays those same intervals the colors are of a light purple represents 120 to 134. Magenta is 135 to 149. A yellow is 150 to 164, Cyan is 150 or 165 to 179. Dark purple is 180 to 194, a peach is 195 to 209, blue is 210 to 224, lighter purple than the first purple was 225 to 239 and a darker blue is 240 to 254, pink color is 255 to 269.$

Try it Now 3

The total cost of textbooks for the term was collected from 36 students. Create a histogram for this data.

$140 $160 $160 $165 $180 $220 $235 $240 $250 $260 $280 $285

$285 $285 $290 $300 $300 $305 $310 $310 $315 $315 $320 $320

$330 $340 $345 $350 $355 $360 $360 $380 $395 $420 $460 $460

Answer

Using a class intervals of size 55, we can group our data into six intervals:

$\begin{array}{|l|r|}
\hline \textbf { cost interval } & \textbf { Frequency } \\
\hline \$ 140-194 & 5 \\
\hline \$ 195-249 & 3 \\
\hline \$ 250-304 & 9 \\
\hline \$ 305-359 & 12 \\
\hline \$ 360-414 & 4 \\
\hline \$ 415-469 & 3 \\
\hline
\end{array}$

We can use the frequency distribution to generate the histogram.

When collecting data to compare two groups, it is desirable to create a graph that compares quantities.

Example 10

The data below came from a task in which the goal is to move a computer mouse to a target on the screen as fast as possible. On 20 of the trials, the target was a small rectangle; on the other 20, the target was a large rectangle. Time to reach the target was recorded on each trial.

$\begin{array}{|c|c|c|}
\hline \begin{array}{c}
\textbf { Interval } \\
\textbf { (milliseconds) }
\end{array} & \begin{array}{c}
\textbf { Frequency } \\
\textbf { small target }
\end{array} & \begin{array}{c}
\textbf { Frequency } \\
\textbf { large target }
\end{array} \\
\hline 300-399 & 0 & 0 \\
\hline 400-499 & 1 & 5 \\
\hline 500-599 & 3 & 10 \\
\hline 600-699 & 6 & 5 \\
\hline 700-799 & 5 & 0 \\
\hline 800-899 & 4 & 0 \\
\hline 900-999 & 0 & 0 \\
\hline 1000-1099 & 1 & 0 \\
\hline 1100-1199 & 0 & 0 \\
\hline
\end{array}$

One option to represent this data would be a comparative histogram or bar chart, in which bars for the small target group and large target group are placed next to each other.

$Histogram bar chart the Purple represents the small target and magenta represents the large target. The left column is the frequency, and it goes up by two each of the horizontal axis represents the entire interval and these represent the reaction time in milliseconds. Each of the bar graphs are comparing one another within each interval. In the interval 400 to 499, the purple has a height of one and the magenta has a height of five , in the interval 500 to 599 the purple has a height of three and the magenta has a height of 10, in the interval 600 to 699 the purple has a height of six and the magenta has a height of five, in the interval 700 to 799 there's only a purple bar that has a height of four in the interval 800 to 899 there's only a purple bar with a height of four, in the interval 1000 to 1099 there's only a purple interval for purple bar with a height of one.$

Definition: Fequency Polygon

An alternative representation is a frequency polygon. A frequency polygon starts out like a histogram, but instead of drawing a bar, a point is placed in the midpoint of each interval at height equal to the frequency. Typically the points are connected with straight lines to emphasize the distribution of the data.

Example 11

This graph makes it easier to see that reaction times were generally shorter for the larger target, and that the reaction times for the smaller target were more spread out.

$Frequency polygon that has a vertical axis with tick marks that go up to 10 and it goes up by twos. 0 2 4 6 8 and 10 would be the vertical axis. And the horizontal axis is the reaction time in milliseconds. And these have midpoints which are 350 450 550 650 750 850 950 1050 and 1150. In the legend, the small target is a dark blue and the pink is the large target the small target has heights of 0 1 3 6 5 4 0 1 and 0 at each midpoint in that order. The large target represented by the pink has frequencies of 0 5 10 5 0 0 0 0 and 0 at each of the midpoints. Lines are connecting each of the midpoints together$

[1] Gallup Poll. March 5-8, 2009. http://www.pollingreport.com/enviro.htm

[2] For example: http://nces.ed.gov/nceskids/createAgraph/ or http://docs.google.com

[3]CNN/Opinion Research Corporation Poll. Dec 19-21, 2008, from http://www.pollingreport.com/civil.htm

Search

Text Color

Text Size

Margin Size

Font Type