3.7.1: Preparation S.7
- Page ID
- 148591
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Normal Distributions
In the previous unit we saw that global temperature anomalies followed bell-shaped distributions. Temperature anomalies are just one of many real-world observations that have bell-shaped distributions. To examine properties of bell-shaped distributions and determine probabilities associated with their values, we often model bell-shaped distributions with normal distributions. When we model a data distribution with a normal distribution, we say that the data are normally distributed. Note that actual data typically will not be perfectly symmetrical and bell-shaped, so the normal distribution does not perfectly reflect the data; however, a normal distribution can still be used as a model that helps us understand and analyze the data.
Normal distributions have a wide range of applications. One application is modeling large distributions of observations in nature. For example, consider the distribution of heights of all adult men in New York City (NYC). This distribution is extremely large and contains over three million values. If we obtain a representative sample of values in this distribution, we can estimate the population mean and population standard deviation, and use these estimates of the population parameters to model the population with a normal distribution.
Suppose we obtain a representative sample of all NYC adult male heights to determine that their mean height is 68 inches with a standard deviation (SD) of 3.5 inches. We can construct a normal distribution based on these estimated parameters and draw a normal curve to represent the distribution. Normal curves are graphs of normal distributions. The graph of the normal distribution with a mean of 68 and SD of 3.5 is shown below.
Figure 1: Normal Curve Representing the Distribution of Heights of NYC Adult Males
Note that the normal curve is centered on the mean. This normal distribution is a theoretical model of the large distribution of heights of NYC adult males. Normal distributions are examples of continuous probability distributions because they enable us to find probabilities corresponding to any interval of data values.
Normal distributions have three key properties:
- The mean, median, and mode are all equal.
- The normal distribution is symmetric about the mean.
- The total area below a normal curve is 1, which is 1.00 or 100%, since all (100%) of the values must fall somewhere within the distribution. (Note: this will be discussed later in the lesson.)
Before going further, answer the questions below about the normal distribution of heights of all adult males in NYC.
- The mean, median, and mode of heights of adult males in NYC are equal to 68 inches. What does this tell us?
- The distribution of heights is symmetric about the mean. What does this tell us?
- Heights of adult males in NYC have a standard deviation of 3.5 inches. What does this tell us about the heights in this distribution?
Normal Distributions
Normal distributions have the same shape, but vary based on their center (mean) and spread (standard deviation). Figure 2 below shows the graphs of three normal distributions (Curve A, Curve B, and Curve C).
- Match the graphs of the normal distributions with the appropriate means and standard deviations (SD) given in the questions below.
Figure 2: Curves A, B, and C
(a) Mean = 40, SD = 1.2.
(b) Mean = 50, SD = 1.5.
(c) Mean = 40, SD = 3.0.
Empirical Rule
The Empirical Rule describes the variability of data values in a normal distribution. It specifies the approximate percentage of data values within any normal distribution which lie within one, two, and three standard deviations from the mean.
In any normal distribution:
- About 68% of data values lie within 1 standard deviation of the mean.
- About 95% of data values lie within 2 standard deviations of the mean.
- About 99.7% of data values lie within 3 standard deviations of the mean.
To illustrate this rule, let’s apply it to the specific normal distribution with a mean of 68 and a standard deviation of 3.5. This is the distribution we saw in Figure 1 (heights of all NYC adult males). Figure 3 below applies the Empirical Rule to that distribution. Note that the values below the horizontal axis correspond to the mean and the positions that are one, two and three standard deviations above and below the mean. Below these values are the corresponding Z-scores.
Figure 3: Empirical Rule Applied to Normal Distribution with Mean = 68 and SD = 3.5
In addition to showing the intervals that contain approximately 68%, 95% and 99.7% of the data values in the distribution, Figure 3 also shows the approximate percentage of data values in specific subintervals. For example, in this normal distribution, approximately 68% of data values are between
64.5 and 71.5. This interval, 64.5–71.5, contains data values that are within one standard deviation from the mean and is said to contain the middle 68% of data values. The area under the distribution in this interval is then 0.68. Note that the intervals that contain 0.15% of the data include all data below 45 and all data above 75, respectively.
It is important to note that the sum of the percentages in these eight regions is 100%. The total area below a normal curve is 1, representing the fact that 100% of values in the distribution are contained within it. We can use the percentages (areas) in these subintervals to determine the percentage of data values in other intervals.
- Use Figure 3 to answer the questions below. What percentage of data values in this normal distribution are:
(a) Between 68 and 75?
(b) Less than 64.5?
(c) Greater than 75?
- Suppose we found that the heights of U.S. adult females are normally distributed with a mean of 63.4 inches and a standard deviation of 2.3 inches. In the normal curve below, fill in the boxes below the x-axis by specifying the mean of the normal distribution and the positions that are one, two and three standard deviations from the mean. Write each value in its corresponding box.
- What interval contains the middle 95% of U.S. adult female heights?
- What interval contains the middle 99.7% of U.S. adult female heights?
- What percentage of U.S. adult females have heights less than 65.7 inches?
- In Unit S.6, we saw that data values in a bell-shaped distribution that have Z-scores less than or equal to −2, or greater than or equal to 2, are unusual in their distribution. Using this same definition of an unusual observation, what heights would be considered unusual for U.S. adult females? (Hint: Identify the heights that are two or more standard deviations from the mean in both directions.)
After Preparation S.7 (survey)
You should be able to do the following things for the next collaboration. Rate how confident you are on a scale of 1–5 (1 = not confident and 5 = very confident).
Before beginning Collaboration S.7, you should understand the concepts and demonstrate the skills listed below:
Skill or Concept: I can … | Rating from 1 to 5 |
identify the shape of a data distribution. | |
compute the Z-score of a data value in a data set given the mean and standard deviation of the data set. |