Skip to main content
Mathematics LibreTexts

5.5: Measures of Center and Spread

  • Page ID
    203112
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Let’s begin by trying to find the most “typical” value of a data set.

    Note that we just used the word “typical” although in many cases you might think of using the word “average.” We need to be cautious with the word “average” as it has different meanings to different people in various contexts. One of the most common uses of the word “average” is what mathematicians and statisticians call the arithmetic mean, or just plain old mean for short. “Arithmetic mean” sounds rather fancy, but it is likely you have calculated a mean many times without realizing it; the mean is what most people think of when they use the word “average”.

    Definition: Mean

    The mean of a set of data is the sum of the data values divided by the number of values.

    Example \(\PageIndex{1}\): Find Mean

    Marci’s exam scores for her last math class were: \(79, 86, 82, 94\). The mean of these values would be:

    \( \dfrac{79 + 86 + 82 + 94}{4} = 85.25 \)

    Typically, we round the mean to one more decimal place than the original data. In this case, we would round \(85.25\) to \(85.3\). Thus, we can say Marci’s average score on her math exams was \(85.25\) or about \(85.3\).

    Let’s look at the following two examples in the box, and how the mean changes with data value(s)

    The \(15\) families in a particular neighborhood are asked about their annual household income, rounded to the nearest \(\$1,000\).

    \[25~15~35~15~30~15~20~20~50~50~45~50~40~50~50\nonumber\]

    Calculating the mean annual household income

    \[\frac{25+15+35+15+30+15+20+20+50+50+45+50+40+50+50}{15}=34\nonumber\]

    The mean household income of our sample is \(34\) thousand dollars (\($34,00\)).

    Building on the last example, suppose a new family moves into the neighborhood, with a household income of \(\$2\) million (\(\$ 2,000,000\)). Adding this to our sample, our mean (after rounding to the nearest dollar) is now:

    \[\frac{25+15+35+15+30+15+20+20+50+50+45+50+40+50+50+2,000}{16}=157\nonumber\]

    The mean household income of our sample is \(157\) thousand dollars (\($157,00\)).

    Imagine the data values on a see-saw or balance scale. The mean is the value that balances the data, as shown in the picture below.

    A simple balance scale with two blue rectangular blocks on one side and one blue rectangular block on the other side.

    If we graph our household data, the \(\$2\) million data value is so far out to the right that the mean has to adjust up to keep things in balance. The \(\$2\) million data value is called an outlier.

    Three blue rectangles are positioned above a horizontal line, with a triangle below the line indicating a point.

    For this reason, when working with data that has outlier values far outside the primary grouping, it is common to use a different measure of center, the median.

    Outliers

    An outlier is a data value that is very different from the rest of the data and is far enough from the center.

    Definition: Median

    The median of a set of data is the value in the middle when the data is in order

    To find the median, follow these simple steps:

    Step \(1\): Arrange the data. Put all the data values in order from smallest to largest.

    Step \(2\): Count the number of values.

    Let \(n\) be the total number of data values.

    Step \(3\): Find the middle.

    • If \(n\) is odd:
      The median is the middle number of the ordered data set (there is only one middle value). It is the data value in the \(\left(\frac{n+1}{2}\right)\)th position or ordered data set.

    • If \(n\) is even:
      The median is the average of the two middle numbers. These are the values in the \(\frac{n}{2}\)​th and \(\left(\frac{n}{2}+1\right)\)th positions of the ordered data set.

    We can interpret the median as “half of the data is less than the median, and the other half is more than the median.” Of course, we can rewrite this in the context of the problem.

    Look at the following ordered \(31\) data. \[37~~33~~33~~32~~29~~28~~28~~23~~22~~22~~22~~21~~21~~21 ~~20 \nonumber\] \[20~~19~~ 19~~18 ~~18~~18~~18~~16~~15~~14~~14 ~~14~~12~~12 ~~9~~6\nonumber\]

    Since \(n= 31\) data values, an odd number, there is only one middle number, which is \(\left(\frac{31+1}{2}=16\right)\)th data value, which is \(20\).

    So there are \(15\) values below \(20\) and \(15\) above it.

    Example \(\PageIndex{2}\): Find the Median

    Find the median of these quiz scores:

    \[5~~10~~8~~6~~4~~8~~2~~5~~7~~7\nonumber\]

    Answer

    We start by listing the data in order:

    \[2~~4~~5~~5~~6~~7~~7~~8~~8~~10\nonumber\]

    Since there are \(10\) data values, an even number, there are two middle numbers. So, we find the mean of the two middle numbers. The position of those two middle numbers are \(\left(\frac{10}{2}=5\right)\)th and \(\left(\frac{10}{2}+1=6\right)\)th. Which are \(6\) and \(7\).

    \[\dfrac{(6+7)}{2} = 6.5. \nonumber\]

    The median quiz score was \(6.5\).

    We can say that half of the quiz scores were lower than \(6.5\), and the other half were higher than \(6.5\).

    Your Turn \(\PageIndex{2}\): Mean or Median?

    Let us return now to our original household income data. To find the median, let's order the data

    \[15~~15~~15~~20~~20~~25~~30~~35~~40~~45~~50~~50~~50~~50~~ 50\nonumber\]

    Since there are \(15\) data values, an odd number, the median will be the middle number, the \(\frac{15+1}{2}=8\)th data value. So \(8\)th data value is \(35\).

    The median income in this neighborhood sample is \(\$35\) thousand. Thus, half of the households’ earned income is less than \(\$35,000\) and the other half earned more than \(\$35,000\).

    If we add the new neighbor with a \(\$2\) million household income, then there will be \(16\) data values.

    \[15~~15~~15~~20~~20~~25~~30~~35~~40~~45~~50~~50~~50~~50~~ 50~~20,00\nonumber\]

    The position of those two middle numbers are \(\left(\frac{16}{2}=8\right)\)th and \(\left(\frac{16}{2}+1=9\right)\)th. Which are \(35\) and \(40\).

    \[\dfrac{(35+40)}{2} = 37.5 \nonumber\]

    As we discovered in the last example, the median is \(\$35\) thousand. Notice that the new neighbor did not affect the median in this case. The median is less affected by outliers than the mean.

    Let’s think about the previous example. When we added the \(16\)st family’s income, the mean was \(\$157,000\) from \(\$34,00\). That’s a big difference in the average household income. We observe that the mean is influenced by the values of the data; that is, the mean can increase or decrease depending on the values of the data. However, when calculating the median including the \(16\)st family’s income, the median wasn’t influenced at all.

    In fact, the median is generally considered a better statistic for household income, as there is a wide range of income among families. Thus, the values of the data influence the mean, but not the median.

    In addition to the mean and the median, there is one other common measurement of the “typical” value of a data set: the mode.

    Definition: Mode

    The mode is the observed value of the data set that occurs most frequently.

    The mode is most commonly used for categorical data, for which the median and mean cannot be computed. Additionally, the mode is the only central tendency used for both categorical and quantitative data. The mean and median are only used with quantitative data.

    Example \(\PageIndex{3}\): Finding Mode
    1. The 15 families in a particular neighborhood are asked about their annual household income, rounded to the nearest \(\$1,000\). Find the mode.\[25~15~35~15~30~15~20~20~50~50~45~50~40~50~50\nonumber\]
    2. Five real estate exam scores are given below. Find the mode. \[430, 430, 480, 480, 495\nonumber\]
    Answer
    1. Mode is \(50,000\).
    2. The data set is bimodal because the scores \(430\) and \(480\) each occur twice.

    A data set can have more than one mode if several categories have the same frequency, or no modes if every category occurs only once.

    Which Is Better: Mode, Median, or Mean?

    If the mode, median, and mean all purport to measure the same thing (centrality), why do we need all three? The answer is complicated, as each measure has its own strengths and weaknesses. The mode is simple to compute, but there may be more than one. Further, if no data value appears more than once, the mode is entirely unhelpful. As for the mean and median, the main difference between these two measures is how each is affected by extreme values.

    Consider this example: the mean and median of \(1, 2, 3, 4, 5 \) are both \(3\). But what if the dataset is instead \(1, 2, 3, 4, 10\)? The median is still \(3\), but the mean is now \(4\). What this example shows is that the mean is sensitive to extreme values, while the median isn’t. This knowledge can help us decide which of the two is more relevant for a given dataset. If it is important that the really high or really low values are reflected in the measure of centrality, then the mean is the better option. If very high or low values are not important, however, then we should stick with the median.

    The decision between mean and median only really matters if the data are skewed. If the data are symmetric, then the mean and median are going to be approximately equal, and the distinction between them is irrelevant. If the data are skewed, the mean is pulled in the direction of the skew (i.e., if the data are right-skewed, then the mean will be greater than the median; if the data are left-skewed, the opposite relation holds). The relationship between mean, mode, and median is given in the table below

    Symmetric Data

    \(4,5,6, 6, 6,7,7,7, 7,7, 7,8, 8, 8, 9,10\)

    Bar chart displaying data distribution, with peak values around 6 to 8, and lower frequencies at 4, 5, 9, and 10.

    The mean is \(7\), the median is \(7\), and the mode is \(7\).

    \(\text{mean}=\text{mode}=\text{median}\)

    Skewed Right Data

    \(6, 7, 7, 7, 7,8, 8,8, 9, 10\)

    Bar graph with purple bars representing values ranging from 6 to 10, with the highest bar at 8.

    The mean is \(7.7\), the median is \(7.5\), and the mode is \(7\).

    \(\text{mean}>\text{median}>\text{mode}\)

    Skew Left Data

    \(6, 7,7, 7,7, 8, 8, 8,9, 10\)

    Bar graph displaying data with five bars; the tallest bar reaches just above 7, representing the highest value.

    The mean is \(6.3\), the median is \(6.5\), and the mode is \(7\).

    \(\text{mean}<\text{median}<\text{mode}\)

    Example \(\PageIndex{4}\): Which is Better Measure?

    Suppose that in a small town of \(50\) people, one person earns \(\$5,000,000\) per year and the other \(49\) each earn \(\$30,000\). Which is the better measure of the "center": the mean or the median?

    Answer

    There are \(49\) people who earn \(\$30,000\). So we add \(\$30,000\), \(49\) times. And one person who earns \(\$5,000,000\).

    \[\text{Mean} = \dfrac{5,000,000+49(30,000)}{50} = 129,400\nonumber\]

    \[\text{Median} = 30,000\nonumber\]

    The median is a better measure of the "center" than the mean because \(49\) of the values are \(\$30,000\) and one is \(\$5,000,000\). The \(\$5,000,000\) is an outlier. The \(\$30,000\) gives us a better sense of the middle of the data.

    Your Turn \(\PageIndex{4}\): Comparing Mean and Median

    Measure of Dispression

    Consider these three sets of quiz scores:

    Section A: \[5~~5~~5~~5~~5~~5~~5~~5~~5~~5\nonumber\]

    Section B: \[ 0~~0~~0~~0~~0~~10~~10~~10~~10~~10\nonumber\]

    Section C: \[4~~4~~4~~5~~5~~5~~5~~6~~6 ~~6\nonumber\]

    All three of these sets of data have a mean of \(5\) and a median of \(5\), yet the sets of scores are clearly quite different. In section A, everyone had the same score; in section B, half the class got no points and the other half got a perfect score, assuming this was a \(10\)-point quiz. Section C was not as consistent as Section A, but not as widely varied as Section B.

    In addition to the mean and median, which are measures of the “typical” or “middle” value, we also need a measure of how “spread out” or varied each data set is.

    There are several ways to measure this “spread” of the data. The first is the simplest and is called the range.

    Definition: Range

    The range is the difference between the maximum value and the minimum value of the data set

    Using the quiz scores from above,

    For section A, the range is \(0\) since both maximum and minimum are \(5\) and \(5 – 5 = 0\)

    For section B, the range is \(10\) since \(10 – 0 = 10\)

    For section C, the range is \(2\) since \(6 – 4 = 2\)

    In the last example, the range appears to reveal how spread out the data is. However, suppose we add a fourth section, Section D, with scores

    \[0~~5~~5~~5~~5~~5~~5~~5~~5~~10\nonumber\]

    This section also has a mean and median of \(5\). The range is \(10\), yet this data set is quite different from Section B. To better illuminate the differences, we’ll have to turn to more sophisticated measures of variation.

    Definition: Standard Deviation

    The standard deviation is a measure of variation that indicates the distance each data value deviates from, or differs from, the mean. A few important characteristics:

    • Standard deviation is always positive. The standard deviation will be zero if all the data values are equal, and it will increase as the data values spread out.
    • Standard deviation has the same units as the original data.
    • Standard deviation, like the mean, can be highly influenced by outliers.

    Let's use the data from Section D of the quiz score.

    \[0~~5~~5~~5~~5~~5~~5~~5~~5~~10\nonumber\]

    We would like to get an idea of the “average” deviation from the mean, but if we find the average of the values in the second column, the negative and positive values cancel each other out (this will always happen), so to prevent this, we square every value in the second column:

    Data Value Deviation: Data Value - Mean (Deviation)2
    \(0\) \(0-5 = -5\) \( (-5)^2= 25\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(5\) \(5-5 = 0\) \( (0)^2= 0\)
    \(10\) \(10-5 = 5\) \( (5)^2= 25\)

    We then add the squared deviations up to get \(25 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 25 = 50\). Ordinarily, we would then divide by the number of scores, \(n\), (in this case, 10) to find the mean of the deviations. But we only do this if the data set represents a population; if the data set represents a sample (as it almost always does), we instead divide by \(n - 1\) (in this case, \(10 - 1 = 9\)) [4].

    So, in our example, we would have \(\dfrac{50}{10} = 5\) if section D represents a population and \(\dfrac{50}{9} =\) about \(5.56\) if section D represents a sample. These values (\(5\) and \(5.56\)) are called, respectively, the population variance and the sample variance for section D.

    Variance can be a useful statistical concept, but note that the units of variance in this instance would be points-squared since we squared all of the deviations. What are points-squared? Good question. We would rather deal with the units we started with (points in this case), so to convert back we take the square root and get:

    \(\text{Population Standard Deviation } = \sqrt{\dfrac{50}{10}} = \sqrt{5} ≈ 2.2\)

    \(\text{Sample Standard Deviation } = \sqrt{\dfrac{50}{9}} ≈ 2.4\)

    If we are unsure whether the data set represents a sample or a population, we typically assume it is a sample and round answers to one more decimal place than the original data, as we have done above.

    To Compute Standard Deviation:
    1. Find the deviation of each data point from the mean. In other words, subtract the mean from the data value.
    2. Square each deviation.
    3. Add the squared deviations.
    4. Divide by \(n\), the number of data values, if the data represents a whole population; divide by \(n – 1\) if the data is from a sample.
    5. Compute the square root of the result. (This result is the standard deviation.)
    Example \(\PageIndex{5}\): Finding Standard Deviat

    Computing the standard deviation for Section B above, we first calculate that the mean is \(5\). Using a table can help keep track of your computations for the standard deviation:

    Data Value Deviation: Data Value - Mean (Deviation)2
    \(0\) \(0-5 = -5\) \((-5)^2 = 25\)
    \(0\) \(0-5 = -5\) \((-5)^2 = 25\)
    \(0\) \(0-5 = -5\) \((-5)^2 = 25\)
    \(0\) \(0-5 = -5\) \((-5)^2 = 25\)
    \(0\) \(0-5 = -5\) \((-5)^2 = 25\)
    \(10\) \(10-5 = 5\) \((5)^2 = 25\)
    \(10\) \(10-5 = 5\) \((5)^2 = 25\)
    \(10\) \(10-5 = 5\) \((5)^2 = 25\)
    \(10\) \(10-5 = 5\) \((5)^2 = 25\)
    \(10\) \(10-5 = 5\) \((5)^2 = 25\)

    Assuming this data represents a population, we will add the squared deviations, divide by \(10-1=9\), the number of data values, and compute the square root:

    \(\sqrt{\dfrac{25 + 25 + 25 + 25 + 25 + 25 + 25 + 25 + 25 + 25}{10-1}} = \sqrt{\dfrac{250}{9}} =5.27\)

    Notice that the standard deviation of this data set is much larger than that of section D, since the data in this set is more spread out.

    For comparison, the standard deviations of all four sections are

    Section A: \(5~~5~~5~~5~~5~~5~~5~~5~~5~~5\) Standard deviation: \(0\)
    Section B: \(0~~0~~0~~0~~0~~10~~10~~10~~10~~10\) Standard deviation: \(5\)
    Section C: \(4~~4~~4~~5~~5~~5~~5~~6~~6 ~~6\) Standard deviation: \(0.8\)
    Section D: \(0~~5~~5~~5~~5~~5~~5~~5~~5~~10\) Standard deviation: \(2.2\)
    Your Turn \(\PageIndex{5}\): Finding Standard Deviation

    Where standard deviation is a measure of variation based on the mean, quartiles are based on the median.

    Definition: Quartiles

    Quartiles are values that divide a dataset into four equal parts, with each part containing \(25\%\) of the data points

    The first quartile \(Q_1\) is the value so that \(25\%\) of the data values are below or equal to it. The third quartile \(Q_3\) is the value so that \(75\%\) of the data values are less than or equal to it.

    You may have guessed that the second quartile is the same as the median, \(Q_1\). The median is the value such that \(50\%\) of the data values are below or equal to it.

    Box plot displaying quartiles: First Quartile (Q1) at 25%, Median (Q2) at 50%, and Third Quartile (Q3) at 75%.
    Figure \(\PageIndex{1}\): Quartiles (QT)

    Quartiles divide the data into quarters, as we see above Figure \(\PageIndex{1}\).

    • \(25\%\) of the data is between the minimum and \(Q_1\).
    • \(25\%\) of the data is between \(Q_1\) and the median (\(Q_2\)).
    • \(50\%\) of the data is between \(Q_1\) and \(Q_3\).
    • \(75\%\) of the data is above \(Q_3\).

    First Quartile \(\equiv\) Lower Quartile \(\equiv\) \(Q_1\).

    Second Quartile \(\equiv\) Median, \(Q_2\) \(\equiv\) Middle Quartile.

    Third Quartile \(\equiv\) Upper Quartile \(\equiv\) \(Q_3\).

    While quartiles are not a single-number summary of variation like standard deviation, they are used in conjunction with the median, minimum, and maximum values to form a 5-number summary of the data.

    Definition: Five-Number Summary

    The five-number summary takes this form

    Minimum, \(Q_1\), Median, \(Q_3\), Maximum

    How to find the quartiles in the data set?

    How to Find Q1 and Q3
    1. Begin by ordering the data from smallest to largest.
    2. Find the median of the data values, which is the middle quartile.
    3. Find the median of the data values that are smaller than the median. Those data values are referred to as the lower half. Which is the lower quartile,\(Q_1\).
    4. Find the median of the data values that are larger than the median. Those data values are referred to as the upper half. Which is the upper quartile, \(Q_3\).

    Let's find the quartiles for the ordered data given below

    \[45~~47~~52~~52~~53~~55~~ 56~~58~~62~~80\nonumber\]

    Chart illustrating the median of a data set, showing the lower half (45-52) and upper half (58-80), with calculated median of 54.
    Figure \(\PageIndex{2}\): Finding Quartiles [FQ]

    Let’s look at some examples. We can also calculate the five-number summary using calculators or scientific software such as Excel, Minitab, or R. However, in this course, we only get our feet wet with statistics, so we can calculate these values quickly by hand.

    Example \(\PageIndex{6}\): Finding Quartiles

    Suppose we have measured \(9\) females and their heights (in inches), sorted from smallest to largest, are given below. Find the Quartiles.

    \[59~~60~~62~~64~~66~~67~~ 69~~70~~72\nonumber\]

    Answer

    Median is \(66\) inches. We can say that \(50\%~~\text{or}~~\frac{1}{2}\) of females are shorter than or equal to \(66\) inches, and the other \(50\%~~\text{or}~~\frac{1}{2}\) are taller than \(65\) inches.

    Data values that are smaller than the medians are

    \[59~~60~~62~~64\nonumber\]

    \[\text{Lower Quartile}=\dfrac{(60+62)}{2} = 61\nonumber\] We can say that \(25\%~~\text{or}~~\frac{1}{4}\) of females are shorter than or equal to \(61\) inches, and the other \(75\%~~\text{or}~~\frac{3}{4}\) are taller than \(61\) inches.

    Data values that are larger than the medians are

    \[67~~ 69~~70~~72\nonumber\]

    \[\text{Upper Quartile}=\dfrac{(69+70)}{2} = 69.5\nonumber\] We can say that \(75\%~~\text{or}~~\frac{3}{4}\) of females are shorter than or equal to \(69.5\) inches, and the other \(25\%~~\text{or}~~\frac{1}{4}\) are taller than \(69.5\) inches.

    Your Turn \(\PageIndex{6}\): Find Quartiles

    The 5-number summary combines the first and third quartiles with the minimum, median, and maximum values.

    In the example with a sample of \(9\) females, the median is \(66\), the minimum is \(59\), and the maximum is \(72\). Hence, the \(5\)-number summary is:

    \[59~~61~~66~~69.5~~72\nonumber\]

    Note that the five-number summary divides the data into four intervals, each of which will contain about \(25\%\) of the data. For visualizing data, there is a graphical representation of a five-number summary called a box plot, or box and whisker graph.

    For visualizing data, there is a graphical representation of a five-number summary called a box plot, or box and whisker graph.

    Definition: Box Plot

    A box plot is a graphical representation of a five-number summary.

    To create a box plot, a number line is first drawn with equidistant tick marks. A box is drawn from the first quartile to the third quartile, and a line is drawn through the box at the median. “Whiskers” are extended out to the minimum and maximum values.

    Example \(\PageIndex{7}\): Five Number Summary

    What is the box plot of the data in example \(\PageIndex{6}\)?

    Answer

    The box plot below is based on the \(5\)-number summary from the sample of \(9\) female heights:

    \(59~~ 61~~ 66~~69.5 ~~ 72\)

    Box plot displaying data distribution, with central box indicating interquartile range and lines extending to the minimum and maximum values.
    Figure \(\PageIndex{3}\): Box Plot
    Your Turn \(\PageIndex{7}\): Five Number Summary

    Box plots are particularly useful for comparing data from two populations or samples. In fact, when comparing two samples, it is always preferable to use box plots.

    Example \(\PageIndex{8}\): Reading Box Plot

    The box plot of service times for two fast-food restaurants is shown below.

    Two box plots displaying service times in minutes: Scene 1 shows data from 0.1 to 6.7, Scene 2 from 0.2 to 6.4.
    Figure \(\PageIndex{4}\): Side by Side Two Box Plot
    1. Which fast-food restaurant has a smaller median service time but a more spread-out distribution of service times?
    2. In Store \(1\), what percentage of customers were served within \(2.9\) minutes?
    3. In Store \(1\), \(75\%\)of customers were served in less than what amount of time?
    4. In Store \(2\), what percentage of customers waited more than \(5.7\) minutes?
    5. If Alex waited \(4\) minutes, and this wait time is longer than that of \(90\%\) of customers, was he in Store \(1\) or Store \(2\)?
    Answer
    1. While Store \(2\) had a slightly shorter median service time (\(2.1\) minutes vs. \(2.\) minutes), Store \(2\) is less consistent, with a wider spread of the data.
    2. In Store \(1\), \(75\%\) of customers were served within \(2.9\) minutes.
    3. In Store \(2\), \(75\%\)of customers were served within less than \(5.7\) minutes.
    4. \(25\%\)
    5. If Alex waited \(4\) minutes, and this wait time is longer than that of \(90\%\) of customers, he is in Store \(1\).
    Your Turn \(\PageIndex{8}\): Reading Box Plot

    Weighted Average

    Have you considered how your grade point average (GPA) is calculated? Your business program requires the successful completion of many courses. Your grades in each course combine to determine your GPA; however, not every course necessarily has the same level of importance as measured by your course credits.

    Perhaps your math course takes one hour daily while your communications course is only delivered in one-hour sessions three times per week. Consequently, the college assigns the math course five credit hours and the communications course three credit hours. If you want an average, these different credit hours mean that the two courses do not share the same level of importance, and therefore a simple average cannot be calculated.

    In a weighted average, not all pieces of data share the same level of importance or they do not occur with the same frequency. The data cannot represent a percent change or a series of numbers intended to be multiplied with each other. To calculate a weighted average, you require two components:

    1. The data itself—you need the value for each piece of data.
    2. The weight of the data—you need to know how important each piece of data is to the average. This is either an assigned value or a reflection of the number of times each piece of data occurs (the frequency).

    To calculate a weighted average, add the products of the weights and data for the entire data set, and then divide this total by the total of the weights.

    Example \(\PageIndex{9}\): Find GPA

    A mark transcript received by a student at a local college. The chart shows how the grade translates into a grade point.

    Calculate the student's grade point average (GPA). Round your final answer to two decimals.

    Course Letter Grade Credit hours (Weight)
    ENC 1100 B \(4\)
    MAC 2302 A \(5\)
    PSY 1203 B+ \(3\)
    PHY 2017 C \(4\)
    HUM 1048 A+ \(3\)
    MGF 1130 D \(4\)
    Grade Grade Point Value
    A+ \(4.5\)
    A \(4.0\)
    B+ \(3.5\)
    B \(3.0\)
    C+ \(2.5\)
    C \(2.0\)
    D \(1.0\)
    F \(0.0\)
    Answer

    The courses do not carry equal weights as they have different credit hours. Therefore, to calculate the GPA, you must find a weighted average.

    Course Grade

    Grade Point

    (Data Value)

    Credit hours (Weight) Value x Weight
    ENC 1100 B \(3.0\) \(4\) \(3\times 4=12\)
    MAC 2302 A \(4.0\) \(5\) \(4\times5=20\)
    PSY 1203 B+ \(3.5\) \(3\) \(3.5\times3=10.5\)
    PHY 2017 C \(2.0\) \(4\) \(2\times4=8\)
    HUM 1048 A+ \(4.5\) \(3\) \(4.5\times3=13.5\)
    MGF 1130 D \(1.0\) \(4\) \(1\times4=4\)
    TOTAL   \(23\) \(68\)

    \[\text{GPA}~=\frac{68}{23}=2.96\nonumber\]

    Your Turn \(\PageIndex{9}\): Find GPA
    Example \(\PageIndex{7}\): Weighted Average

    In your Mathematical Thinking class, your final grade is based on six categories. The categories are tests, labs, homework assignments, and a final exam. The final average for the course is the weighted average of scores earned in these categories with the following weights.

    Assignments Tests Project Homework Final Exam Quiz Discussion
    Weights \(50\%\) \(5\%\) \(15\%\) \(20\%\) \(5\%\) \(5\%\)

    Suppose you earned the following grades on average in each of the categories: \(85\%\) on tests, \(0\%\) on the project,\(90\%\) on homework assignments, \(50\%\) on the final exam, \(80\%\) on the quiz, and \(100\%\) on the discussion. Determine your weighted average in the course. Record the average below as a percentage accurate to two decimal places.

    Answer

    Multiply every group grade average by its associated weight.

    Course Weight

    Average Score

    (Data Value)

    Value x Weight
    Tests \(0.50\) \(85\%\) \(0.50\times 85\%=42.5\%\)
    Project \(0.05\) \(0\%\) \(0.05\times0\%=0\%\)
    Homework \(0.15\) \(90\%\) \(0.15\times90\%=13.5\%\)
    Final Exam \(0.20\) \(50\%\) \(0.20\times50\%=10\%\)
    Quiz \(0.05\) \(80\%\) \(0.05\times80\%=4\%\)
    Discussion \(0.05\) \(100\%\) \(0.05\times100\%=5\%\)

    Total (Weighted Average)

    \(1\)   \(75\%\)

    \[\text{Weight Average in the Course}~=\frac{42.5\%+0\%+13.5\%+10\%+4\%+5\%}{1}=75\%\nonumber\]

    Your Turn \(\PageIndex{10}\): Find Weighted Average

    5.5: Measures of Center and Spread is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.