Skip to main content
Mathematics LibreTexts

6.6: Corequisite- Computing Measures of Center; Computing and Interpreting Z-scores

  • Page ID
    148638
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    SPECIFIC OBJECTIVES

    By the end of this lesson, you should understand that

    • distributions of data can be described based on their shape.
    • distributions of data can be used to explore real-world situations.
    • the Z-score of a data value can be used to assess whether a value is unusual within its distribution.

    By the end of this lesson, you will be able to

    • compute and interpret the mean, median and mode of a data set from its histogram.
    • compute and interpret the Z-score of a data value within a bell-shaped distribution.

    PROBLEM SITUATION 1: A SKEWED DISTRIBUTION

    In this lesson we will examine shapes of distributions of quantitative data. The shape of a distribution describes how the values in the distribution vary. By knowing the shape of a distribution we can understand how the mean and median of the distribution compare. This is an important consideration when identifying whether the mean, median or mode best describes the center of a data set. The shape of a distribution also informs us on how we can use the distribution to determine unusual values.

    County-Level Data

    There are approximately 3150 counties in the U.S. The histogram below summarizes data for a random sample of 1000 counties. The sample contains about one-third of all U.S. counties. Since the random sample is sufficiently large, we can think of it as being representative of all counties in the U.S.

    The following histogram is skewed. In a skewed distribution, the shape of the distribution is not symmetric so one side has a longer tail of values, and the mean and median are different. The histogram below displays the percentage of adults in a county who have a bachelor’s degree or higher in 2010. This histogram is skewed to the right.

    The number above each bar is the frequency of the group or values. The midpoint of each bar is the position on the horizontal axis at the center of the bar. For example, the midpoint of the first bar is 2.5 and the midpoint of the second bar is 7.5. The first bar can be interpreted as indicating that there are 2 counties in which 2.5% of the adults in the counties have a bachelor’s degree or higher. The second bar can be interpreted as indicating that there are 82 counties in which 7.5% of adults in the counties have a bachelor’s degree or higher.

    Histogram of 1000 counties. Frequency of bachelor's degree on y-axis, bachelor's degree on x-axis. y-axis has scale 0 to 300 in increments of 50. x-axis has scale 0 to 80 in increments of 10. 0-5 is 2. 5-10 is 82. 10-15 is 291. 15-20 is 291. 20-25 is 140. 25-30 is 82. 30-35 is 49. 35-40 is 26. 40-45 is 13. 45-50 is 10. 50-55 is 7. 55-60 is 5. 60-65 is 2.

    (1) Describe the values in this histogram. What values are typical? What values are unusual? Explain in the context of the distribution.

    (2) Since the histogram contains grouped data, we cannot use the individual data values to find the mean, median and mode. We must estimate these measures of center using the information provided in the histogram. The mean can be estimated by a weighted mean. Round the mean and median to two decimal places.

    Mean =

    Median =

    The two modes are:

    (3) Which measure: the mean, median, or mode, best represents the typical percentage of adults in a county who have a bachelor’s degree or higher? Explain.

    (4) What percentage of the counties in this sample have less than 20% of its adults having a bachelor’s degree or higher? Write your answer as a percentage rounded to one decimal place.

    (5) What percentage of the counties in this sample have between 20% and 50% of its adults having a bachelor’s degree or higher?

    (6) Is it unusual for a county to have a majority of adults with a bachelor's degree or higher? Explain.

    (7) The mean of these data values is approximately 20 (meaning 20 percent) and the standard deviation is approximately 10 (meaning 10 percentage points). Each value in the data set has a deviation from the mean. The standard deviation tells us that the typical deviation from the mean is about 10 percentage points.

    (a) Let’s refer to the County-level Data on education. The mean of these data values is approximately 20% and the standard deviation is approximately 10%. How many data values are between 10% per and 30%? In other words, how many data values are within 1 standard deviation (10%) from the mean (20%)?

    (b) How many data values are within 2 standard deviations (10%) from the mean (20%)? First, find the lower value. The mean minus 2 standard deviations is equal to ____________%.

    (c) How many data values are 2 standard deviations or more above the mean?

    (d) What percentage of the counties in this sample are 2 standard deviations or more above the mean? Round your answer to the nearest percent.

    PROBLEM SITUATION 2: EXAMINING BELL-SHAPED DISTRIBUTIONS

    The highest level of education achieved is an example of a categorical variable. Categorical variables separate data into groups or categories, like “At Least a High School Degree” and “Bachelor’s Degree or Higher.”The median age of people living in a county is an example of a quantitative variable. Quantitative variables are variables whose values are obtained through counts or measurements. The histogram below displays the median age (in years) of people living in 1000 counties in the U.S. The data is based on the 2020 U.S. Census.19

    Bell-shapped distribution barchart for 1000 counties. y-axis is frequency of median age with scale 0 to 200 in increments of 40; x-axis is median age with scale 0 to 70 in increments of 20. median age of 40 is about 200. To the left and right of 40, the frequency drops to 150, then 120, then 60, then 40, and further down below 40 the further we get from the median age of 40.

    This histogram is approximately bell-shaped. In a bell-shaped distribution, the mean and median are approximately equal, so the percentage of data values less than the mean is almost the same as the percentage of data values greater than the mean. When this occurs, the mean of the distribution is considered to be at the center of the distribution.

    (8) (a) Without performing a calculation, estimate the mean of the histogram above.

    (b) What values appear to be unusual? Explain.

    (9) The standard deviation of the median ages in the sample above is 5 years. Interpret this standard deviation.

    Z-scores

    Statisticians often want to examine whether a data value is unusual within its distribution. When a distribution is bell-shaped and the mean and standard deviation are known, we can find the relative position of a data value in the distribution by calculating its standardized score or Z-score. A Z-score describes the number of standard deviations that a data value is from the mean. The Z-score formula is shown below.

    \[Z-score = \dfrac{value - mean}{standard\;deviation} \nonumber\]

    Every data value in a distribution has a Z-score. Z-scores can be negative, positive, or zero. When a Z-score is less than or equal to −2, or greater than or equal to 2, we say that the corresponding data value is unusual, since the data value is relatively far from the mean. In fact, only 5% of data values in a bell-shaped distribution will have a Z-score less than or equal to -2, or greater than or equal to 2.

    The table below contains the median ages (in years) of four counties in the U.S.20

    Counties in the U.S.
    County, State Median Age

    Webb County, TX 29.6

    Columbus County, NC 42.3

    Franklin County, MA 47.2

    Pickett County, TN 51.7

    (10) For the 1,000 counties in the random sample, the mean of the median ages is 41.1 years and the standard deviation of the median ages is 5 years.

    (11) Another variable that has a bell-shaped distribution is the percentage of adults in a county who smoke cigarettes. This distribution has a mean of 21.5 (meaning 21.5%) and a standard deviation of 6 (meaning 6 percentage points). In one county in Alaska, 40.5% of adults smoke cigarettes.

    Find the deviation from the mean and the Z-score of this data value. Round the Z-score to two decimal places.

    (12) Dekalb County in Georgia had characteristics in 2020 which made it below average in comparison to other counties in the U.S. The median age of people in Dekalb County was 36.7 years. In Dekalb County, 10.8% of adults smoked cigarettes.

    Characteristic Mean Standard Deviation Values in Dekalb County, GA
    Median age 41.1 years 5 years 36.7 years
    Percentage of adult smokers 21.5% 6% 10.8%

    Which of these characteristics is more unusual when Dekalb County is compared to other counties in the U.S.? Explain.

    __________________________________________

    19 https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-detail.html

    20 https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-detail.html


    This page titled 6.6: Corequisite- Computing Measures of Center; Computing and Interpreting Z-scores is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Carnegie Math Pathways (WestEd) via source content that was edited to the style and standards of the LibreTexts platform.