6.6: Corequisite- Computing Measures of Center; Computing and Interpreting Z-scores
- Page ID
- 148638
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)SPECIFIC OBJECTIVES
By the end of this lesson, you should understand that
- distributions of data can be described based on their shape.
- distributions of data can be used to explore real-world situations.
- the Z-score of a data value can be used to assess whether a value is unusual within its distribution.
By the end of this lesson, you will be able to
- compute and interpret the mean, median and mode of a data set from its histogram.
- compute and interpret the Z-score of a data value within a bell-shaped distribution.
PROBLEM SITUATION 1: A SKEWED DISTRIBUTION
In this lesson we will examine shapes of distributions of quantitative data. The shape of a distribution describes how the values in the distribution vary. By knowing the shape of a distribution we can understand how the mean and median of the distribution compare. This is an important consideration when identifying whether the mean, median or mode best describes the center of a data set. The shape of a distribution also informs us on how we can use the distribution to determine unusual values.
County-Level Data
There are approximately 3150 counties in the U.S. The histogram below summarizes data for a random sample of 1000 counties. The sample contains about one-third of all U.S. counties. Since the random sample is sufficiently large, we can think of it as being representative of all counties in the U.S.
The following histogram is skewed. In a skewed distribution, the shape of the distribution is not symmetric so one side has a longer tail of values, and the mean and median are different. The histogram below displays the percentage of adults in a county who have a bachelor’s degree or higher in 2010. This histogram is skewed to the right.
The number above each bar is the frequency of the group or values. The midpoint of each bar is the position on the horizontal axis at the center of the bar. For example, the midpoint of the first bar is 2.5 and the midpoint of the second bar is 7.5. The first bar can be interpreted as indicating that there are 2 counties in which 2.5% of the adults in the counties have a bachelor’s degree or higher. The second bar can be interpreted as indicating that there are 82 counties in which 7.5% of adults in the counties have a bachelor’s degree or higher.
(1) Describe the values in this histogram. What values are typical? What values are unusual? Explain in the context of the distribution.
(2) Since the histogram contains grouped data, we cannot use the individual data values to find the mean, median and mode. We must estimate these measures of center using the information provided in the histogram. The mean can be estimated by a weighted mean. Round the mean and median to two decimal places.
Mean =
Median =
The two modes are:
(3) Which measure: the mean, median, or mode, best represents the typical percentage of adults in a county who have a bachelor’s degree or higher? Explain.
(4) What percentage of the counties in this sample have less than 20% of its adults having a bachelor’s degree or higher? Write your answer as a percentage rounded to one decimal place.
(5) What percentage of the counties in this sample have between 20% and 50% of its adults having a bachelor’s degree or higher?
(6) Is it unusual for a county to have a majority of adults with a bachelor's degree or higher? Explain.
(7) The mean of these data values is approximately 20 (meaning 20 percent) and the standard deviation is approximately 10 (meaning 10 percentage points). Each value in the data set has a deviation from the mean. The standard deviation tells us that the typical deviation from the mean is about 10 percentage points.
(a) Let’s refer to the County-level Data on education. The mean of these data values is approximately 20% and the standard deviation is approximately 10%. How many data values are between 10% per and 30%? In other words, how many data values are within 1 standard deviation (10%) from the mean (20%)?
(b) How many data values are within 2 standard deviations (10%) from the mean (20%)? First, find the lower value. The mean minus 2 standard deviations is equal to ____________%.
(c) How many data values are 2 standard deviations or more above the mean?
(d) What percentage of the counties in this sample are 2 standard deviations or more above the mean? Round your answer to the nearest percent.
PROBLEM SITUATION 2: EXAMINING BELL-SHAPED DISTRIBUTIONS
The highest level of education achieved is an example of a categorical variable. Categorical variables separate data into groups or categories, like “At Least a High School Degree” and “Bachelor’s Degree or Higher.”The median age of people living in a county is an example of a quantitative variable. Quantitative variables are variables whose values are obtained through counts or measurements. The histogram below displays the median age (in years) of people living in 1000 counties in the U.S. The data is based on the 2020 U.S. Census.19
This histogram is approximately bell-shaped. In a bell-shaped distribution, the mean and median are approximately equal, so the percentage of data values less than the mean is almost the same as the percentage of data values greater than the mean. When this occurs, the mean of the distribution is considered to be at the center of the distribution.
(8) (a) Without performing a calculation, estimate the mean of the histogram above.
(b) What values appear to be unusual? Explain.
(9) The standard deviation of the median ages in the sample above is 5 years. Interpret this standard deviation.
Z-scores
Statisticians often want to examine whether a data value is unusual within its distribution. When a distribution is bell-shaped and the mean and standard deviation are known, we can find the relative position of a data value in the distribution by calculating its standardized score or Z-score. A Z-score describes the number of standard deviations that a data value is from the mean. The Z-score formula is shown below.
\[Z-score = \dfrac{value - mean}{standard\;deviation} \nonumber\]
Every data value in a distribution has a Z-score. Z-scores can be negative, positive, or zero. When a Z-score is less than or equal to −2, or greater than or equal to 2, we say that the corresponding data value is unusual, since the data value is relatively far from the mean. In fact, only 5% of data values in a bell-shaped distribution will have a Z-score less than or equal to -2, or greater than or equal to 2.
The table below contains the median ages (in years) of four counties in the U.S.20
| Counties in the U.S. |
| County, State Median Age |
|
Webb County, TX 29.6 Columbus County, NC 42.3 Franklin County, MA 47.2 Pickett County, TN 51.7 |
(10) For the 1,000 counties in the random sample, the mean of the median ages is 41.1 years and the standard deviation of the median ages is 5 years.
(a) Use these values to find the Z-score of each median age in the table above. Round Z-scores to two decimal places.
(b) Are any of these counties’ median ages unusual? Explain.
(11) Another variable that has a bell-shaped distribution is the percentage of adults in a county who smoke cigarettes. This distribution has a mean of 21.5 (meaning 21.5%) and a standard deviation of 6 (meaning 6 percentage points). In one county in Alaska, 40.5% of adults smoke cigarettes.
Find the deviation from the mean and the Z-score of this data value. Round the Z-score to two decimal places.
(12) Dekalb County in Georgia had characteristics in 2020 which made it below average in comparison to other counties in the U.S. The median age of people in Dekalb County was 36.7 years. In Dekalb County, 10.8% of adults smoked cigarettes.
| Characteristic | Mean | Standard Deviation | Values in Dekalb County, GA |
| Median age | 41.1 years | 5 years | 36.7 years |
| Percentage of adult smokers | 21.5% | 6% | 10.8% |
Which of these characteristics is more unusual when Dekalb County is compared to other counties in the U.S.? Explain.
__________________________________________
19 https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-detail.html
20 https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-detail.html


