5.6: Normal Distribution and Percentiles
- Page ID
- 206902
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)When you take an exam, what is often as important as your actual score on the exam is the way your score compares to other students’ performance. If you made a \(70\) but the average score (whether the mean, median, or mode) was \(85\), you did relatively poorly. If you made a \(70\) but the average score was only \(55\) then you did relatively well. In general, the significance of one observed value in a data set strongly depends on how that value compares to the other observed values in a data set. Therefore, we wish to attach a number to each observed value that measures its relative position.
As we know, quartiles in the previous section. Now we will introduce the percentile.
Percentiles
Anyone who has taken a national standardized test is familiar with the idea of being given both a score on the exam and a “percentile ranking” of that score. You may be told that your score was \(625\) and that it is the \(85^{th}\) percentile. The first number tells how you actually did on the exam; the second says that \(85\%\) of the scores on the exam were less than or equal to your score, \(625\).
A percentile is a measure of ranking. It represents a location measurement of a data value to the rest of the values. Many standardized tests give the results as a percentile. Doctors also use percentiles to track a child’s growth.
The kth percentile is the data value that has \(k\%\) of the data at or below that value.
The Pth percentile cuts the data set in two so that approximately \(P\%\) of the data lie below or equal to it and \((100−P)\% \) of the data lie above it. In particular, the three percentiles that cut the data into fourths are called the quartiles of a data set. The quartiles are the three numbers \(Q_1\), \(Q_2\), \(Q_3\) that divide the data set approximately into fourths.
Given an observed value \(x\) in a data set, \(x\) is the \(P^{th}\) percentile of the data if \(P\%\) of the data are less than or equal to \(x\). The number \(P\) is the percentile rank of \(x\).
- What does a score of the \(90\)th percentile mean? If a sample consists of \(800\) test scores, how many of them would be at or below the \(90\)th percentile?
- How many scores are above the \(90\)th percentile?
- Answer
-
- This means that \(90\%\) of the scores were at or below this score. (A person did better than \(90\%\) of the test takers.)\[0.90\times800=720\nonumber\] So \(720\) score would be below or at \(90\)th percentile.
- \(800-720=80\) of the scores were above \(90\)th percentile.
If the test was out of \(100\) points and you scored at the \(80\)th percentile, what was your score on the test?
- Answer
-
You don’t know! All you know is that you scored the same as or better than \(80\%\) of the people who took the test. If all the scores were low, you could have still failed the test. On the other hand, if many of the scores were high, you could have gotten.
\(z\)-Scores and Normal Distribution
You have probably seen a "bell-shaped curve" before. It turns out that many data sets, particularly when the sample size is sufficiently large, are distributed symmetrically, with most data values around the center (mean) and few outliers, resulting in a bell-shaped curve. This is called a "normal distribution." For instance, the weights of all the people in a certain country are likely distributed normally, and so are their heights. Normal distributions can describe all sorts of data involving people's cholesterol levels, blood pressure, duration of pregnancy, IQ (intelligence quotient) scores, credit scores, and many more. For instance, the pregnancy (gestation) period for humans is known to have an average of 266 days (38 weeks) with a standard deviation of 16 days. The average IQ score is defined as \(100\) with a standard deviation of \(15\) . Your SAT and other standardized test scores are defined in similar ways.
But what does it mean, and how does this information help us?
When data values are distributed normally, given their mean and standard deviation, we can then standardize the normal distribution and calculate various percentages related to scores/values. For example, given that the IQ scores have a mean of \(100\) and a standard deviation of \(15\) , one can conclude that only about\(2.3\%\) of the population has an IQ of \(130\) or higher.
Here is how this works. First, let us define the "Standard Normal Distribution."
The standard normal distribution is a normal distribution (bell-shaped curve) with a mean of 0 and a standard deviation of 1.
It turns out that there is a simple way to "convert" any normal distribution to the standard one. It involves what is known as the "\(z\)-score."
Another way to locate a particular observation \(x\) in a data set is to compute its distance from the mean in units of standard deviation. The \(z\)-score indicates how many standard deviations an individual observation \(x\) is from the center of the data set, its mean. It is used on distributions that have been standardized, which allows us to better understand its properties. If \(z\) is negative, then \(x\) is below average. If \(z\) is \(0\) then \(x\) is equal to the average. If \(z\) is positive, then \(x\) is above the average.
A z-score tells you how many standard deviations a value is away from the mean (average)
- Positive Z-score: The value is above average.
- Negative Z-score: The value is below average.
- Z-score of \(0\): The value is exactly the average.
The following formula is used to calculate the z-score (z) of a data value (x) when the mean, standard deviation, and data value (x) are provided
FORMULA
\[z-\text{score of data} = \dfrac{\text{data value(\(x\))} - \text{mean (M)}}{\text{standard deviation (SD)}} \nonumber \]
Sometime \(z\) score of data value \(x\) is denoted by \(z_x\).
Another notation:
Mean: \(\mu\) or M
Standard Deviation: \(\sigma\) or SD
On a nationwide math test, the mean was \(65\) and the standard deviation was \(10\). If Robert scored \(81\), what was his \(z\)−score?
- Answer
-
\[\begin{array}{l}
z&=\dfrac{x-\text{~M}}{\text{~SD}} \\
&=\dfrac{81-65}{10} \\
&=\dfrac{16}{10} \\
&=1.6\\
\end{array} \nonumber \]This means Robert’s score was \(1.6\) standard deviations above the mean.
The following formula is used to calculate the data value (x) when the mean, standard deviation, and z-score are provided.
FORMULA
\[\text{ data } = \text{mean}+z-\text{data value}\times\text{standard deviation} \nonumber \]
A math test has mean \(70\) and standard deviation \( 8\). A student has a z-score of \(1.5\) on this test. What is the student’s actual score?
- Answer
-
Here, we need to find the student's score (data value), denoted as \(x\).
\[\begin{array}{l}
\text{data(\(x\))}&=\text{M}+z-\text{score}\times\text{SD}\\
&=70+1.5\times8\\
&=82\\
\end{array} \nonumber \]This means Robert’s score was \(82\) corresponding to z-score \(1.5\).
A server at a restaurant tracks their weekly tips earned and hours worked. Over several weeks, the average weekly tips earned is \(\$500\) with a standard deviation of \(\$100\). This week, the server earned \(\$650\) in tips. Meanwhile, the average number of hours worked per week is \(30\) hours with a standard deviation of \(5\) hours, and this week, the server worked \(400\) hours.
Using z-scores, determine whether the server's weekly tips or weekly hours worked deviate more from the average, indicating where their performance was stronger relative to past trends.
- Answer
-
Using the z-score formula:
z-score for weekly tips: \[z = \dfrac{650 - 500}{100} = \dfrac{150}{100} = 1.5\nonumber\]
z-score for weekly hours worked:
\[z = \dfrac{40 - 30}{5} = \dfrac{10}{5} = 2.0 \nonumber\]The z-score for hours worked \(92\) is higher than the z-score for tips earned \(1.5\). This means that, relative to their average, the server worked significantly more hours than usual, compared to the amount of tips they earned. Therefore, their performance in terms of hours worked deviated more from the average than their tips.
Z-Score Percentile Table
A percentile table (or z-table) is a reference chart that converts a z-score into a percentile rank. A percentile tells you the percentage of people or data points that scored below you. If you want to find the percentile for a given z score. You can use the following table.
Let's say you have a data value, and if you calculate a Z-score for your data value, it is \(0.20\). You then look it up in the table and find the corresponding percentile value,\(57.93\).
Meaning: This indicates that \(57.93\%\) of all observations are less than or equal to your value. In other words, your score is in the \(58\)th percentile. In other words, only \(100%-57.93%=42.07\%\) observations are more than your score.

Find the percentage of data items in a normal distribution that lie either above, below, or between.
- above \(1.20\) and below \(1.20\).
- below \(-1.80\) and above \(-1.80\).
- between \(0.20\) and \(1.40\).
- Answer
-
According to the above table
- \(88.49\%\) below and \((100\%−88.49\%)=11.51\%\) above.
- \(3.59\%\) below and \((100\%−3.59\%)=96.41\%\) above.
- For \(z=0.20\); \(57.93\%\) are below and for \(z=1.4\); \(91.92\%\) are below. So \(91.92\%−57.93\%=33.99\%\) in between.
Suppose that the cholesterol levels are normally distributed with a mean of 160 (mg/dl) with a standard deviation of 40 (mg/dl).
- What is the \(z\)-score corresponding to the cholesterol level of \(200\) (mg/dl)? How about \(260\) (mg/dl)?
- What percent of people have a cholesterol level between \(160\) and \(200\)? How about \(260\) or higher?
- Answer
-
a. Note that data value is \(200\) mg/dl. So its z-score is
\[\text{\(z\)-score}=\dfrac{200-160}{40}=1\nonumber\]
Similarly, z- score of \(260\) is
\[\text{\(z\)-score}=\dfrac{260-160}{40}=2.5\nonumber\]
b. z- score of \(160\) is
\[\text{\(z\)-score}=\dfrac{160-160}{40}=0\nonumber\]
The percent (percentile ) between \(160\) and \(200\) would correspond to the z-score between \(0\) and \(1\).
For \(z=0.0\); \(50\%\) are below and for \(z=1\); \(84.13\%\) are below. So \(84.13\%−50\%=34.13\%\) in between.
In other words, about \(34\%\) of all people have a cholesterol level between \(160\) and \(200\).
However, \(160\) mg/dl corresponds to \(z=2.5\). The table shows \(99.38\%\) below \(z=2.5\). Since \(100\%-99.38\%=0.62 \%\). Only \(0.62\%\), i.e., less than \(1\%\) of the population, has a cholesterol level \(160\) or higher. This person with a cholesterol level of \(260\) should go see the doctor immediately.
You probably have a good intuitive grasp of what the average of a data set says about that data set. In the following section, we begin to learn what the standard deviation reveals about the nature of the data set.
The Empirical Rule
We start by examining a specific set of data. Table \(\PageIndex{1}\) shows the heights in inches of \(100\) randomly selected adult men. A relative frequency histogram for the data is shown in Figure \(\PageIndex{1}\). The mean and standard deviation of the data, rounded to two decimal places, are \(M = 69.92 \) and \(SD = 1.70\).
If we go through the data and count the number of observations that are within one standard deviation of the mean, that is, that are between \(69.92-1.70=68.22\) and \(69.92+1.70=71.62\) inches, there are \(69\) of them.
If we count the number of observations that are within two standard deviations of the mean, that is, that are between \(69.92-2(1.70)=66.52\) and \(69.92+2(1.70)=73.32\) inches, there are \(95\) of them.
All of the measurements are within three standard deviations of the mean, that is, between \(69.92-3(1.70)=64.822\) and \(69.92+3(1.70)=75.02\) inches.
These tallies are not coincidences but rather align with the following result, which has been found to be widely applicable.
The Empirical Rule (also called the \(68–95–99.7\) rule describes how data are distributed in a normal (bell-shaped) distribution
- approximately \(68\%\) of the data lie between one standard deviation mean, i.e., within (\(\text{M} -1\times\text{SD},\text{M}+1\times\text{SD})\).
- approximately \(95\%\) of the data lie between one standard deviation mean, i.e., within (\(\text{M} -2\times\text{SD},\text{M}+2\times\text{SD})\).
- approximately \(99.7\%\) of the data lie between one standard deviation mean, i.e., within (\(\text{M} -3\times\text{SD},\text{M}+3\times\text{SD})\).
Where M is the mean, and SD is the standard deviation.
Two key points in regard to the Empirical Rule are that the data distribution must be approximately bell-shaped and that the percentages are only approximately true. The Empirical Rule does not apply to data sets with severely asymmetric distributions, and the actual percentage of observations in any of the intervals specified by the rule could be either greater or less than those given in the rule.
We see this with the example of the heights of the men: the Empirical Rule suggested 68 observations between \(68.22\) and \(71.62\) inches, but we counted \(69\).
Scores on IQ tests have a bell-shaped distribution with a mean of \(100\) and a standard deviation of \(10\). Discuss what the Empirical Rule is after sketching a normal distribution curve. What is the percentile of IQ scores of \(110\) and \(90\)?
- Answer
-
A sketch of the IQ distribution is given in Figure \(\PageIndex{3}\). The Empirical Rule states that
Figure \(\PageIndex{4}\): Distribution of IQ Scores - approximately \(68\%\) of the IQ scores in the population lie between \(90\) and \(110\)
- approximately \(95\%\) of the IQ scores in the population lie between \(80\) and \(120\)
- approximately \(99.7\%\) of the IQ scores in the population lie between \(70\) and \(130\).
Since \(84\%\) data are less than or equal to \(110\), we conclude that the IQ score \(110\) is the \(84^{th}\) percentile. Similarly, Since \(16\%\) data are less than or equal to \(90\), we conclude that the IQ score \(90\) is the \(16^{th}\) percentile.
- About what percentage of all such men are between \(68.2\) and \(71\) inches tall?
- What interval centered on the mean should contain about \(95\%\) of all such men?
- What percentage of all such men are less than \(66.8\) inches in tall?
- Answer
-
- A sketch of the distribution of heights is given in Figure \(\PageIndex{3}\).Since the interval from \(68.2\) to \(71.0\) has endpoints \(\text{~Mean} -1\times\text{~SD}\) and \(\text{~Mean} +1\times\text{~SD}\), by the Empirical Rule about \(68\%\) of all \(18\)-year-old males should have heights in this range
- By the Empirical Rule the shortest such interval has endpoints \[\text{~Mean} -2\times\text{~SD}=69.6-2(1.4)=66.8 \nonumber \]
\[\text{~Mean}+2\times\text{~SD}=69.6+2(1.4)=72.4 \nonumber \] The interval in question is the interval from \(66.8\) inches to \(72.4\) inches. - \(0.15\%+2.35\%=2.5\%\)
-
Figure \(\PageIndex{5}\): Distribution of Heights
-

