Skip to main content
Mathematics LibreTexts

8.1: Measures of Central Tendency and Dispersion (Ungrouped Data)

  • Page ID
    139602
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    Learning Objectives

    • Recognize, describe, and calculate the measures of the center of data.
    • Recognize, describe, and calculate the measures of the spread of data.
    • Use the Empirical rule to interpret the mean and standard deviation.

    Measures of the Center of the Data

    The "center" of a data set is also a way of describing the location. The two most widely used measures of the "center" of the data are the mean (average) and the median. To calculate the mean weight of 50 people, add the 50 weights together and divide by 50. To find the median weight of the 50 people, order the data and find the number that splits the data into two equal parts. The median is generally a better measure of the center when there are extreme values or outliers because it is not affected by the precise numerical values of the outliers. The mean is the most common measure of the center. 

    The letter used to represent the sample mean is an \(x\) with a bar over it (pronounced “\(x\) bar”): \(\overline{x}\). The Greek letter \(\mu\) (pronounced "mew") represents the population mean. One of the requirements for the sample mean to be a good estimate of the population mean is for the sample taken to be truly random.

     

    You can quickly find the location of the median by using the expression

    \[\dfrac{n+1}{2}\]

    The letter \(n\) is the total number of data values in the sample. If \(n\) is an odd number, the median is the middle value of the ordered data (ordered smallest to largest). If \(n\) is an even number, the median is equal to the average of the two middle values after the data has been ordered. For example, if the total number of data values is 97, then

    \[\dfrac{n+1}{2} = \dfrac{97+1}{2} = 49.\]

    The median is the 49th value in the ordered data. If the total number of data values is 100, then

    \[\dfrac{n+1}{2} = \dfrac{100+1}{2} = 50.5.\]

    The median occurs midway between the 50th and 51st values. The location of the median and the value of the median are not the same. The upper case letter \(M\) is often used to represent the median. The next example illustrates the location of the median and the value of the median.

    Example \(\PageIndex{1}\)

     The following dataset is in order from smallest to largest:

    3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47

    Calculate the mean and the median.

    Answer

    The calculation for the mean is:

    \[\bar{x} = \dfrac{[3+4+(8)(2)+10+11+12+13+14+(15)(2)+(16)(2)+...+35+37+40+(44)(2)+47]}{40} = 23.6\]

    To find the median, \(M\), first use the formula for the location. The location is:

    \[\dfrac{n+1}{2} = \dfrac{40+1}{2} = 20.5\]

    Starting at the smallest value, the median is located between the 20th and 21st values (the two 24s):

    3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47

    \[M = \dfrac{24+24}{2} = 24\]

    Exercise \(\PageIndex{1}\)

    The following dataset is ordered from smallest to largest. Calculate the mean and median.

    3; 4; 5; 7; 7; 7; 7; 8; 8; 9; 9; 10; 10; 10; 10; 10; 11; 12; 12; 13; 14; 14; 15; 15; 17; 17; 18; 19; 19; 19; 21; 21; 22; 22; 23; 24; 24; 24; 24

    Answer

    Mean: \(3 + 4 + 5 + 7 + 7 + 7 + 7 + 8 + 8 + 9 + 9 + 10 + 10 + 10 + 10 + 10 + 11 + 12 + 12 + 13 + 14 + 14 + 15 + 15 + 17 + 17 + 18 + 19 + 19 + 19 + 21 + 21 + 22 + 22 + 23 + 24 + 24 + 24 = 544\)

    \[\dfrac{544}{39} = 13.95\]

    Median: Starting at the smallest value, the median is the 20th term, which is 13.

    Interactive Exercise \(\PageIndex{1}\)

    Another measure of the center is the mode. The mode is the most frequent value. There can be more than one mode in a data set as long as those values have the same frequency and that frequency is the highest. If there are no repeats in a dataset, meaning each value occurs exactly one time, there is no mode.

    Example \(\PageIndex{2}\)

    Statistics exam scores for 20 students are as follows:

    50; 53; 59; 59; 63; 63; 72; 72; 72; 72; 72; 76; 78; 81; 83; 84; 84; 84; 90; 93

    Find the mode.

    Answer

    The most frequent score is 72, which occurs five times. Mode = 72.

    Exercise \(\PageIndex{2}\)

    The number of books checked out from the library from 25 students are as follows:

    0; 0; 0; 1; 2; 3; 3; 4; 4; 5; 5; 7; 7; 7; 7; 8; 8; 8; 8; 10; 10; 11; 11; 12; 12

    Find the mode.

    Answer

    There is a tie for the most frequent value: 7 and 8 both occur four times. Mode = 7 and 8.

    Interactive Exercise \(\PageIndex{2}\)

     

    Interactive Exercise \(\PageIndex{3}\)

    Measures of Variation of the Data

    One of the differences between the two data sets that any measure of center doesn't capture is the variety of data within the set. To describe the variation quantitatively, we use measures of variation or measures of spread. Just as there are several different measures of center, there are also several different measures of variation. In this section, we examine two of the most frequently used measures of variation: the range and standard deviation.

    Definition: Range

    The range of a data set is the difference between the maximum (largest) and minimum (smallest) observations.

    Example \(\PageIndex{4}\)

    Find the range of the data:

    8 12 13 11 10 9 14 8 6 14 7 8 13

    Solution

    The range of the data is the difference between the largest and the smallest values in the data set: 14−6=8

    Interactive Exercise \(\PageIndex{4}\)

     
    Definition: The Standard Deviation

    The range only measures the total variation and doesn't capture any variation between the minimum and maximum observed values. In contrast to the range, the standard deviation takes into account all the observations. It is the preferred measure of variation when the mean is used as the measure of center. Roughly speaking, the standard deviation measures variation by indicating how far, on average, the observations are from the mean. For a data set with a large amount of variation, the observations will, on average, be far from the mean; so the standard deviation will be large. For a data set with a small amount of variation, the observations will, on average, be close to the mean; so the standard deviation will be small.

    Calculating the Standard Deviations

    If \(x\) is a number, then the difference "\(x\) – mean" is called its deviation. In a data set, there are as many deviations as there are items in the data set. The deviations are used to calculate the standard deviation. If the numbers belong to a population, in symbols a deviation is \(x - \mu\). For sample data, in symbols a deviation is \(x - \bar{x}\).

    The procedure to calculate the standard deviation depends on whether the numbers are the entire population or are data from a sample. The calculations are similar, but not identical. Therefore the symbol used to represent the standard deviation depends on whether it is calculated from a population or a sample. The lower case letter s represents the sample standard deviation and the Greek letter \(\sigma\) (sigma, lower case) represents the population standard deviation. If the sample has the same characteristics as the population, then s should be a good estimate of \(\sigma\).

    To calculate the standard deviation, we need to calculate the variance first. The variance is the average of the squares of the deviations (the \(x - \bar{x}\) values for a sample, or the \(x - \mu\) values for a population). The symbol \(\sigma^{2}\) represents the population variance; the population standard deviation \(\sigma\) is the square root of the population variance. The symbol \(s^{2}\) represents the sample variance; the sample standard deviation s is the square root of the sample variance. You can think of the standard deviation as a special average of the deviations.

    If the numbers come from a census of the entire population and not a sample, when we calculate the average of the squared deviations to find the variance, we divide by \(N\), the number of items in the population. If the data are from a sample rather than a population, when we calculate the average of the squared deviations, we divide by n – 1, one less than the number of items in the sample.

    Formulas for the Sample Standard Deviation

    \[s = \sqrt{\dfrac{\sum(x-\bar{x})^{2}}{n-1}} \label{eq1}\]

    or

    \[s = \sqrt{\dfrac{\sum f (x-\bar{x})^{2}}{n-1}} \label{eq2}\]

    For the sample standard deviation, the denominator is \(n - 1\), that is one less than the sample size.

    Example \(\PageIndex{5}\)

    Calculate the sample standard deviation for the following dataset.

    5; 6; 10; 10; 14

    First we calculate the mean:

    \[\bar{x} = \dfrac{5+6+10+10+14}{5} = 9 \nonumber\]

    The mean is 9.

    The variance may be calculated by using a table. Then the standard deviation is calculated by taking the square root of the variance. We will explain the parts of the table after calculating s.

    Data Deviations Deviations2
    x (x – \(\bar{x}\)) (x – \(\bar{x}\))2
    5 5 – 9 = –4 (–4)2 = 16
    6 6 – 9 = –3 (–3)2 = 9
    10 10 – 9 = 1 (1)2 = 1
    10 10 – 9 = 1 (1)2 = 1
    14 14 – 9 = 5 (5)2 = 25
                      The total is 52

    The sample variance, \(s^{2}\), is equal to the sum of the last column (52) divided by the total number of data values minus one (5 – 1):

    \[s^{2} = \dfrac{52}{5-1} = 13 \nonumber\]

    The sample standard deviation s is equal to the square root of the sample variance:

    \[s = \sqrt{13} = 3.605551275 \nonumber\]

    and this is rounded to two decimal places, \(s = 3.61\).

    Interactive Exercise \(\PageIndex{5.1}\)

    Interactive Exercise \(\PageIndex{5.2}\)

    Note

    In practice, USE A CALCULATOR OR COMPUTER SOFTWARE TO CALCULATE THE STANDARD DEVIATION such as this one:

    Descriptive Statistics Calculator

    Regardless of the tool that you use, you still need to be aware of the context and use the appropriate notation for standard deviation \(\sigma\) or \(s\).

    Interpreting the Mean and Standard Deviation Together

    The Empirical Rule

    For data having a distribution that is BELL-SHAPED and SYMMETRIC:

    • Approximately 68% of the data is within one standard deviation of the mean.
    • Approximately 95% of the data is within two standard deviations of the mean.
    • More than 99% of the data is within three standard deviations of the mean.

    The empirical rule is also known as the 68-95-99.7 rule. We will learn more about this when studying the "Normal" or "Gaussian" probability distribution in later chapters.

    Example \(\PageIndex{6}\)

    Suppose \(x\) is from a population with mean 50 and standard deviation 6 with bell-shape distribution.

    • About 68% of the x values lie within one standard deviation of the mean. Therefore, about 68% of the x values lie between –1σ = (–1)(6) = –6 and 1σ = (1)(6) = 6 of the mean 50. The values 50 – 6 = 44 and 50 + 6 = 56 are within one standard deviation from the mean 50.
    • About 95% of the x values lie within two standard deviations of the mean. Therefore, about 95% of the x values lie between –2σ = (–2)(6) = –12 and 2σ = (2)(6) = 12. The values 50 – 12 = 38 and 50 + 12 = 62 are within two standard deviations from the mean 50.
    • About 99.7% of the x values lie within three standard deviations of the mean. Therefore, about 99.7% of the x values lie between –3σ = (–3)(6) = –18 and 3σ = (3)(6) = 18 from the mean 50. The values 50 – 18 = 32 and 50 + 18 = 68 are within three standard deviations of the mean 50.
    Exercise \(\PageIndex{6}\)

    The population of scores on a college entrance exam have an approximate bell-shape distribution with mean, \(\mu = 52\) points and a standard deviation, \(\sigma = 11\) points.

    1. About 68% of the \(y\) values lie between what two values? These values are ________________.
    2. About 95% of the \(y\) values lie between what two values? These values are ________________.
    3. About 99.7% of the \(y\) values lie between what two values? These values are ________________.
    Answer

    a. About 68% of the values lie between the values 41 and 63.

    b. About 95% of the values lie between the values 30 and 74.

    c.  About 99.7% of the values lie between the values 19 and 85.

    Interactive Exercise \(\PageIndex{6}\)

    Note

    It is important to note that the Empirical Rule only applies when the shape of the distribution of the data is bell-shaped and symmetric thus allowing us to sketch the following shape of the distribution based only on the two numbers: the mean and the standard deviation:

    the standard normal curve with standard deviations measured on the x-axis

    Figure \(\PageIndex{A}\)

    Interactive Exercise \(\PageIndex{7}\)


    This page titled 8.1: Measures of Central Tendency and Dispersion (Ungrouped Data) is shared under a CC BY license and was authored, remixed, and/or curated by OpenStax.