Skip to main content
Mathematics LibreTexts

5.4: Understanding Statistical Graphs and Tables

  • Page ID
    203111
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    A group of people are at a table, with their hands shown holding pens. They're all pointing to data on a piece of paper.
    Figure \(\PageIndex{1}\): Data visualizations can help people quickly understand important features of a dataset. (credit: "Group of diverse people having a business meeting" by Rawpixel Ltd/Flickr, CC BY 2.0)

    Visualizing Categorical Data

    A bar chart is a visualization of categorical data that consists of a series of rectangles arranged side-by-side (but not touching). Each rectangle corresponds to one of the categories. All of the rectangles have the same width. The height of each rectangle corresponds to either the number of units in the corresponding category or the proportion of the total units that fall into the category.

    Example \(\PageIndex{1}\): Construct Bar Graph

    The students in a statistics class were asked to provide their majors. A bar graph is drawn to visualize this data. Answer the following questions.

    1. How many people participated in the survey?
    2. How many more people plan to major in Biology than in Education?
    3. Is the data on the horizontal axis categorical or quantitative?
    4. Is this graph a bar chart, a histogram, or a Pareto chart?
    5. What is the proportion of students who plan to major in Education?
    6. What percentage of students are unsure about their major?
    Bar graph showing frequency of college majors: Biology has the highest frequency, followed by Undecided, Sociology, Political Science, and Education.
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)
    Answer
    1. \(6+1+3+2+4=16\)
    2. \(6-1=5\) more students majoring in Biology than in Education.
    3. Categorical (Qualitative)
    4. Not Histogram. Not Pareto chart.
    5. \(\frac{1}{16}=0.0625\)
    6. \(\frac{4}{16}=0.25=25\%\)
    Your Turn \(\PageIndex{1}\): Average Monthly Precipitation

    If the data we’re visualizing is categorical, then we want a quick way to represent graphically the relative numbers of units that fall in each category. When we created the frequency distributions in the last section, all we did was count the number of units in each category and record that number (this was the frequency of that category). Frequencies are nice when we’re organizing and summarizing data; they’re easy to compute, and they’re always whole numbers. But they can be difficult to understand for an outsider who’s being introduced to your data.

    Let’s get some practice reading bar graphs.

    Example \(\PageIndex{2}\): Reading Bar Graph

    The bar graph shown gives data on \(2020\) model year cars available in the United States. Analyze the graph to answer the following questions.

    A bar graph titled, 2020 model year cars available in the US. The horizontal axis represents cars. The vertical axis representing percent ranges from 0 to 50, in increments of 5. The graph infers the following data. Hatchback: 4; Minivan: 6; Sedan: 34; Sports: 10; SUV: 44; Wagon: 4. Note: all values are approximate.
    Figure \(\PageIndex{3}\) (data source: consumerreports.org/cars)
    1. What proportion of available cars were sports cars?
    2. What proportion of available cars were sedans?
    3. Which categories of cars each made up less than \(5\%\) of the models available?
    Answer
    1. The bar for sports cars goes up to \(10\%\), so the proportion of models that are considered sports cars is 10%.
    2. The bar corresponding to sedan goes up past \(30\%\) but not quite to \(35\%\). It looks like the proportion we want is between \(33\%\) and \(34\%\).
    3. We’re looking for the bars that don’t make it all the way to the \(5\%\) line. Those categories are hatchback and wagon.
    Your Turn \( \PageIndex{2} \): Understanding Bar Graph
    Your Turn \(\PageIndex{3}\): Reading Bar Graphs

    Pie Charts

    A pie chart consists of a circle divided into wedges, with each wedge corresponding to a category. The proportion of the area of the entire circle that each wedge represents corresponds to the proportion of the data in that category. Pie charts are difficult to create without technology because they require careful measurements of angles and precise circles, both of which are tasks better suited for computers.

    Example \(\PageIndex{4}\): Construct Bar Graph

    Use the data that follows to generate a pie chart.

    Type Percent Type Percent
    SUV 43.6% Minivan 5.5%
    Sedan 33.6% Hatchback 3.6%
    Sports 10.0% Wagon 3.6%
    Table \(\PageIndex{1}\) (data source: www.consumerreports.org/cars)
    Answer

    First, enter the chart above into a new sheet in Google Sheets. Next, click and drag to select the full table (including the header row). Click on the “Insert” menu, then select “Chart.” The result may be a pie chart by default; if it isn’t, you can change it to a pie chart using the “Chart type” drop-down menu in the Chart Editor.

    A pie chart titled, 2020 model year cars available in the US. The circle graph infers the following data. SUV: 43.6 percentage; Sedan: 33.6 percentage; Sports: 10.0 percentage; Minivan: 5.5 percentage; Hatchback: 3.6 percentage; Wagon: 3.6 percentage.
    Figure \(\PageIndex{4}\) (data source: consumerreports.org/cars)

    You can choose to use a legend to identify the categories, as well as label the slices with the relevant percentages.

    Your Turn \( \PageIndex{4} \): Understanding Pie Chart
    People in Mathematics: Florence Nightingale

    Florence Nightingale (1820–1910) is best remembered today for her contributions in the medical field; after witnessing the horrors of field hospitals that tended to the wounded during the Crimean War, she championed reforms that encouraged sanitary conditions in hospitals. For those efforts, she is today considered the founder of modern nursing.

    A portrait of Florence Nightingale.
    Figure \(\PageIndex{5}\): Florence Nightingale's significant contribution to the field of statistical graphics cannot be understated. (credit: "Florence Nightingale" by Library of Congress Prints and Photographs Division/http://hdl.loc.gov/loc.pnp/pp.print, public domain)

    Nightingale is also remembered for her contributions in statistics, especially in the ways we visualize data. She developed a version of the pie chart that is today known as a polar area diagram, which she used to visualize the causes of death among the soldiers in the war, highlighting the number of preventable deaths the British Army suffered in that conflict.

    In 1859, the Royal Statistical Society honored her for her contributions to the discipline by electing her to join the organization. She was the first woman to be so honored. She was later named an honorary member of the American Statistical Association. Nightingale's status as a revered pioneer in both nursing and statistics is a complex one, because some of her writings and opinions demonstrate a colonialist mindset and disregard for those who lost their lives and lands at the hands of the British. Her core statistical writings indicated that she felt superior to the Indigenous people she was treating. Members of both fields continue to debate her near-iconic role.

    Understating Quantitative Data

    As we see in the previous section, there are four distributions that describe quantitative data. They are: uniform (data are equally distributed across the range), symmetric (data are bunched up in the middle, then taper off in the same way above and below the middle), left-skewed (data are bunched up at the high end or larger values, and taper off toward the low end or smaller values), and right-skewed (data are bunched up at the low end, and taper off toward the high end). See below figures.

    Four histograms. The first histogram is titled, Uniform. The horizontal axis ranges from 0 to 5, in increments of 1. The vertical axis ranges from 0 to 60, in increments of 10. The histogram infers the following data. 0 to 1: 38. 1 to 2: 35. 2 to 3: 51. 3 to 4: 39. 4 to 5: 37. The second histogram is titled, Right-skewed. The horizontal axis ranges from 1 to 21, in increments of 2. The vertical axis ranges from 0 to 50, in increments of 10. The histogram infers the following data. 1 to 3: 18. 3 to 5: 40. 5 to 7: 44. 7 to 9: 37. 9 to 11: 29. 11 to 13: 14. 13 to 15: 8. 15 to 17: 5. 17 to 19: 3. 19 to 21: 2. The third histogram is titled, Left-skewed. The horizontal axis ranges from 30 to 50, in increments of 2. The vertical axis ranges from 0 to 60, in increments of 10. The histogram infers the following data. 30 to 32: 3. 32 to 34: 1. 34 to 36: 6. 36 to 38: 11. 38 to 40: 13. 40 to 42: 31. 42 to 44: 39. 44 to 46: 52. 46 to 48: 35. 48 to 50: 8. The fourth histogram is titled, Symmetric. The horizontal axis ranges from 30 to 180, in increments of 15. The vertical axis ranges from 0 to 80, in increments of 10. The histogram infers the following data. 30 to 45: 2. 45 to 60: 7. 60 to 75: 20. 75 to 90: 31. 90 to 105: 70. 105 to 120: 30. 120 to 135: 29. 135 to 150: 10. 150 to 165: 2. 165 to 180: 1. Note: all values are approximate.

    \(\PageIndex{6}\): Distribution of Histogram

    Looking back at the stem-and-leaf plot in the previous example, we can see that the data are bunched up at the low end and taper off toward the high end; that set of data is right-skewed. Knowing the distribution of a set of data gives us useful information about the property that the data are measuring.

    Your Turn \( \PageIndex{5} \): Identity Distribution
    Example \(\PageIndex{6}\): Find Information from Histogram

    The data in “AvgSAT” contains the average SAT score for students attending every institution of higher learning in the US for which data is available. Create a histogram in Google Sheets of the average SAT scores. Use bins of width \(50\). Are the data uniformly distributed, symmetric, left-skewed, or right-skewed?

    A histogram titled, average SAT scores at US institutions. The horizontal axis representing average SAT ranges from 750 to 1600, in increments of 50. The vertical axis representing frequency ranges from 0 to 300, in increments of 100. The histogram infers the following data. 750 to 800: 4. 800 to 850: 5. 850 to 900: 15. 900to 950: 25. 950 to 1000: 85. 1000 to 1050: 170. 1050 to 1100: 230. 1100 to 1150: 275. 1150 to 1200: 275. 1200 to 1250: 110. 1250 to 1300: 70. 1300 to 1350: 50. 1350 to 1400: 40. 1400 to 1450: 40. 1450 to 1500: 20. 1500 to 1550: 20. 1550 to 1600: 5. Note: all values are approximate.
    Figure \(\PageIndex{7}\): Fairly Symmetric Graph
    Answer

    The data are fairly symmetric, but slightly right-skewed.

    Your Turn \( \PageIndex{7} \): Understanding Histogram
    Your Turn \(\PageIndex{8}\): Understanding Histogram

    Misleading Graphs

    Graphical representations of data can be manipulated in ways that intentionally mislead the reader. There are two primary ways this can be done: by adjusting the scales on the axes and by manipulating or misrepresenting the areas of the bars. Let’s look at some examples of these.

    Example \(\PageIndex{9}\): Misleading Graph

    The table below shows the teams and their payrolls in the English Premier League, the top soccer organization in the United Kingdom.

    A bar chart titled, top five EPL teams by payroll (1,000,000 pounds), January 2020. The horizontal axis represents teams. The vertical axis representing payroll (1,000,000 pounds) ranges from 0 to 200, in increments of 50. The bar graph infers the following data. Tottenham Hotspur: 130. Arsenal F.C.: 131. Chelsea F.C.: 133. Manchester City: 137. Manchester United: 176. Note: all values are approximate.
    A bar graph titled, top five EPL teams by payroll (1,000,000 pounds), January 2020. The horizontal axis represents teams. The vertical axis representing payroll (1,000,000 pounds) ranges from 120 to 180, in increments of 20. The bar graph infers the following data. Tottenham Hotspur: 130. Arsenal F.C.: 131. Chelsea F.C.: 133. Manchester City: 137. Manchester United: 176. Note: all values are approximate.
    Figure \(\PageIndex{8}\): Misleading Bar Graph

    How might someone present this data in a misleading way?

    Answer

    You should notice that despite using the same data, these two graphs look strikingly different. In the second graph, the gap between Manchester United and the other four teams looks significantly larger than in the first graph. The scale on the vertical axis has been manipulated here. The first graph's axis starts at zero, while the lowest value on the second graph's axis is \(120\). This trick has a strong impact on the viewer’s perception of the data.

    To further emphasize the difference this creates in our perception, let's examine the data again, but this time using graphics instead of colored areas on our bar graph.

    A bar graph titled, top five EPL teams by payroll (1,000,000 pounds), January 2020. Each vertical bar is represented by a 10 dollar bill. The horizontal axis represents teams. The vertical axis representing payroll (1,000,000 pounds) ranges from 120 to 180, in increments of 20. The bar graph infers the following data. Tottenham Hotspur: 130. Arsenal F.C.: 131. Chelsea F.C.: 133. Manchester City: 137. Manchester United: 176. Note: all values are approximate.
    Figure \(\PageIndex{9}\): (data source: www.spotrac.com)

    This graph uses an image of a £\(10\) banknote in place of the bars. Using an image that evokes the context of the data in place of a standard, “boring” bar is a common tool that people use when creating infographics. However, this is generally not a good practice because it distorts the data. Notice that our “bars” (the banknotes) are just as tall here as they were in the previous figure. But, to maintain the right proportions, the widths had to be adjusted as well, which changes the area (height × width) of each bar. A key point is that when looking at rectangles, the human eye tends to process areas more easily than heights.

    Checkpoint

    Beware of infographics! Areas overemphasize a difference that should be measured with a height!

    Now, let’s look at all 20 teams. This histogram indicates that the data are right-skewed, with the highest number of teams having a payroll between £40 million and £80 million:

    A histogram titled, total payrolls teams in the EPL (1,000,000 pounds), January 2020. The horizontal axis representing payroll (1,000,000 pounds) ranges from 0 to 200, in increments of 40. The vertical axis representing frequency ranges from 0 to 8, in increments of 2. The histogram infers the following data. 0 to 40: 4. 40 to 80: 8. 8 to 120: 3. 120 to 160: 4. 160 to 200: 1.
    A histogram titled frequency versus payrolls (1,000,000 pounds). The horizontal axis represents payroll (1,000,000 pounds) ranges from 0 to 80, in increments of 40. The vertical axis representing frequency ranges from 0 to 8, in increments of 2. The histogram infers the following data. 0 to 40: 4. 40 to 80: 8. Over 80: 8.
    Figure \(\PageIndex{10}\): Misleading Bar Graphs

    Even though this chart uses the same data, the skew seems to be reversed. Why? Well, even though this graph looks like a histogram, it isn’t. Look closely at the labels on the horizontal axis; they don't correspond to spots on the axis, but instead provide a range, meaning this is a bar graph based on a binned frequency distribution.
    When we review these ranges, we can see that the last range is misleading as it consists of all data “over \(80\).” If the bins all had the same width, that last bin would run from \(80\) to \(120\). However, we can see from the histogram that the maximum value for this data is between \(160\) and \(200\). If the last bin in this bar graph were labeled honestly, it would read “\(80–200\),” which would drive home the fact that the width of that bar is misleading.

    Your Turn \( \PageIndex{9} \): Misleading Graph
    Checkpoint

    Always check the horizontal axis on histograms! The widths of all the bars should be equal.

    Minard’s chart is remarkable in that it shows not just how the size of Napoleon’s army shrank drastically over time, but also the location on the map, the direction the army was traveling at the time, and the temperature during the retreat.

    Example \(\PageIndex{10}\): Misleading Bar Graph

    Compare the two graphs below showing support for same-sex marriage rights from a poll taken in December \(2008\) [3]. The difference in the vertical scale on the first graph suggests a different story than the true differences in percentages; the second graph makes it appear as though twice as many people oppose marriage rights as support them.

    Bar graphs comparing support and opposition for same-sex marriage in two separate surveys.

    Your Turn \(\PageIndex{10} \): Misleading Pie Chart

    5.4: Understanding Statistical Graphs and Tables is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.