Skip to main content
Mathematics LibreTexts

8.4.4: Comparing Populations Using Samples

  • Page ID
    39030
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Lesson

    Let's compare different populations using samples.

    Exercise \(\PageIndex{1}\): Same Mean? Same MAD?

    Without calculating, tell whether each pair of data sets have the same mean and whether they have the same mean absolute deviation.

    1. set A 1 3 3 5 6 8 10 14
      set B 21 23 23 25 26 28 30 34
      Table \(\PageIndex{1}\)
    2. set X 1 2 3 4 5
      set Y 1 2 3 4 5 6
      Table \(\PageIndex{2}\)
    3. set P 47 53 58 62
      set Q 37 43 68 72
      Table \(\PageIndex{3}\)

    Exercise \(\PageIndex{2}\): With a Heavy Load

    Consider the question: Do tenth-grade students' backpacks generally weigh more than seventh-grade students' backpacks?

    Here are dot plots showing the weights of backpacks for a random sample of students from these two grades:

    clipboard_e8bad0e1ece5d16770be7cd5a0b870395.png
    Figure \(\PageIndex{1}\): Two dot plots for backpack weight in pounds are labeled grade 7 and grade 10, and the numbers 1 through 22 are indicated. The data are as follows: Grade 7: 2 pounds, 1 dot. 3 pounds, 2 dots. 4 pounds, 3 dots. 5 pounds, 2 dots. 6 pounds, 1 dot. 7 pounds, 2 dots. 9 pounds, 1 dot. 11 pounds, 1 dot. 12 pounds, 1 dot. 13 pounds, 1 dot. Grade 10: 10 pounds, 1 dot. 11 pounds, 2 dots. 12 pounds, 1 dot. 13 pounds, 2 dots. 14 pounds, 2 dots. 15 pounds, 2 dots. 16 pounds, 1 dot. 18 pounds, 2 dots. 20 pounds, 1 dot. 22 pounds, 1 dot.
    1. Did any seventh-grade backpacks in this sample weigh more than a tenth-grade backpack?
    2. The mean weight of this sample of seventh-grade backpacks is 6.3 pounds. Do you think the mean weight of backpacks for all seventh-grade students is exactly 6.3 pounds?
    3. The mean weight of this sample of tenth-grade backpacks is 14.8 pounds. Do you think there is a meaningful difference between the weight of all seventh-grade and tenth-grade students' backpacks? Explain or show your reasoning.

    Exercise \(\PageIndex{3}\): Do They Carry More?

    Here are 10 more random samples of seventh-grade students' backpack weights.

    clipboard_e46771988ea57f92f5cec87667ecbbfe1.png
    Figure \(\PageIndex{2}\): Ten dot plots titled sample one through sample ten, each with the numbers 0 through 18 indicated.
    sample number mean weight (pounds)
    1 5.8
    2 9.2
    3 5.5
    4 7.3
    5 7.2
    6 6.6
    7 5.2
    8 5.3
    9 6.3
    10 6.4
    Table \(\PageIndex{4}\)
      1. Which sample has the highest mean weight?
      2. Which sample has the lowest mean weight?
      3. What is the difference between these two sample means?
    1. All of the samples have a mean absolute deviation of about 2.8 pounds. Express the difference between the highest and lowest sample means as a multiple of the MAD.
    2. Are these samples very different? Explain or show your reasoning.
    3. Remember our sample of tenth-grade students' backpacks had a mean weight of 14.8 pounds. The MAD for this sample is 2.7 pounds. Your teacher will assign you one of the samples of seventh-grade students' backpacks to use.
      1. What is the difference between the sample means for the the tenth-grade students' backpacks and the seventh-grade students' backpacks?
      2. Express the difference between these two sample means as a multiple of the larger of the MADs.
    4. Do you think there is a meaningful difference between the weights of all seventh-grade and tenth-grade students' backpacks? Explain or show your reasoning.

    Exercise \(\PageIndex{4}\): Steel From Different Regions

    When anthropologists find steel artifacts, they can test the amount of carbon in the steel to learn about the people that made the artifacts. Here are some box plots showing the percentage of carbon in samples of steel that were found in two different regions:

    clipboard_e378c9c8ade78052bb619e097605e8f1e.png
    Figure \(\PageIndex{3}\): Two box plots from 0 point 32 to 0 point 76 by 0 point 4s. Top box plot labeled region 1. Whisker from about 0 point 41 to 48. Box from about 0 point 62 to about 0 point 67 with vertical line at 0 point 64. Whisker from about 0 point 67 to about 0 point 7. Bottom box plot labeled region 2. Whisker from about 0 point 37 to 45. Box from about 0 point 45 to 0 point 48 with vertical line at about 0 point 46. Whisker from 0 point 48 to about 0 point 57.
    1. Was there any steel found in region 1 that had:
      1. more carbon than some of the steel found in region 2?
      2. less carbon than some of the steel found in region 2?
    2. Do you think there is a meaningful difference between all the steel artifacts found in regions 1 and 2?
    3. Which sample has a distribution that is not approximately symmetric?
    4. What is the difference between the sample medians for these two regions?
      sample median (%) IQR (%)
      region 1 \(0.64\) \(0.05\)
      region 2 \(0.47\) \(0.03\)
      Table \(\PageIndex{5}\)
    5. Express the difference between these two sample medians as a multiple of the larger interquartile range.
    6. The anthropologists who conducted the study concluded that there was a meaningful difference between the steel from these regions. Do you agree? Explain or show your reasoning.

    Summary

    Sometimes we want to compare two different populations. For example, is there a meaningful difference between the weights of pugs and beagles? Here are histograms showing the weights for a sample of dogs from each of these breeds:

    clipboard_efc886fe953674089d1de755b369adcc6.png
    Figure \(\PageIndex{4}\): A histogram for two different populations: On the horizontal axis, the numbers 6 through 11, in increments of zero point 5, are indicated. The label pug weights in kilograms is indicated for the numbers 6 through 8 and beagle weights in kilograms is indicated for the numbers 9 through 11. On the vertical axis, the numbers 0 through 8 are indicated. The data represented by the bars are as follows: Pug weights in kilograms: Weight from 6 up to 6 point 5, 5. Weight from 6 point 5 up to 7, 5. Weight from 7 up to 7 point 5, 7. Weight from 7 point 5 up to 8, 3. A triangle is located at 6 point 9 kilograms. Beagle weights in kilograms: Weight from 9 up to 9 point 5, 3. Weight from 9 point 5 up to 10, 3. Weight from 10 up to 10 point 5, 8. Weight from 10 point 5 up to 11, 6. A triangle is located at 10 point 1.

    The red triangles show the mean weight of each sample, 6.9 kg for the pugs and 10.1 kg for the beagles. The red lines show the weights that are within 1 MAD of the mean. We can think of these as “typical” weights for the breed. These typical weights do not overlap. In fact, the distance between the means is \(10.1-6.9\) or 3.2 kg, over 6 times the larger MAD! So we can say there is a meaningful difference between the weights of pugs and beagles.

    Is there a meaningful difference between the weights of male pugs and female pugs? Here are box plots showing the weights for a sample of male and female pugs:

    clipboard_e45f1e48b17c1a402597ff8d77c535730.png
    Figure \(\PageIndex{5}\): Two box plots labeled male pug weights in kilograms and female pug weights in kilograms are indicated. The numbers 4 through 8 point 5, in increments of zero point 5, are indicated. The five-number summary for the box plots are as follows: Male pug weights in kilograms: Minimum value, 6 point 4. Maximum value, 8 point 3. Q1, 7 point 2. Q2, 7 point 6. Q3, 7 point 9. Female pug weights in kilograms: Minimum value, 6 point 2. Maximum value, 8. Q1, 6 point 4. Q2, 6 point 9. Q3, 7 point 3.

    We can see that the medians are different, but the weights between the first and third quartiles overlap. Based on these samples, we would say there is not a meaningful difference between the weights of male pugs and female pugs.

    In general, if the measures of center for two samples are at least two measures of variability apart, we say the difference in the measures of center is meaningful. Visually, this means the range of typical values does not overlap. If they are closer, then we don't consider the difference to be meaningful.

    Glossary Entries

    Definition: Interquartile range (IQR)

    The interquartile range is one way to measure how spread out a data set is. We sometimes call this the IQR. To find the interquartile range we subtract the first quartile from the third quartile.

    For example, the IQR of this data set is 20 because \(50-30=20\).

    22 29 30 31 32 43 44 45 50 50 59
    Q1 Q2 Q3
    Table \(\PageIndex{6}\)

    Definition: Proportion

    A proportion of a data set is the fraction of the data in a given category.

    For example, a class has 18 students. There are 2 left-handed students and 16 right-handed students in the class. The proportion of students who are left-handed is \(\frac{2}{20}\), or 0.1.

    Practice

    Exercise \(\PageIndex{5}\)

    Lin wants to know if students in elementary school generally spend more time playing outdoors than students in middle school. She selects a random sample of size 20 from each population of students and asks them how many hours they played outdoors last week. Suppose that the MAD for each of her samples is about 3 hours.

    Select all pairs of sample means for which Lin could conclude there is a meaningful difference between the two populations.

    1. elementary school: 12 hours, middle school: 10 hours
    2. elementary school: 14 hours, middle school: 9 hours
    3. elementary school: 13 hours, middle school: 6 hours
    4. elementary school: 13 hours, middle school: 10 hours
    5. elementary school: 7 hours, middle school: 15 hours

    Exercise \(\PageIndex{6}\)

    These two box plots show the distances of a standing jump, in inches, for a random sample of 10-year-olds and a random sample of 15-year-olds.

    clipboard_e20290b61fa97733fcb2ed36d63d568d0.png
    Figure \(\PageIndex{6}\): Two box plots from 50 to 80 by 2s. Top box plot labeled 10 year olds. Whisker from 51 to 53. Box from 53 to 58 with vertical line at 56. Whisker from 58 to 59. Bottom box plot labeled 15 year olds. Whisker from 64 to 65. Box from 65 to 70 with vertical line at 69. Whisker from 70 to 80.

    Is there a meaningful difference in median distance for the two populations? Explain how you know.

    Exercise \(\PageIndex{7}\)

    The median income for a sample of people from Chicago is about $60,000 and the median income for a sample of people from Kansas City is about $46,000, but researchers have determined there is not a meaningful difference in the medians. Explain why the researchers might be correct.

    Exercise \(\PageIndex{8}\)

    A farmer grows 5,000 pumpkins each year. The pumpkins are priced according to their weight, so the farmer would like to estimate the mean weight of the pumpkins he grew this year. He randomly selects 8 pumpkins and weighs them. Here are the weights (in pounds) of these pumpkins:

    \(2.9\quad 6.8\quad 7.3\quad 7.7\quad 8.9\quad 10.6\quad 12.3\quad 15.3\)

    1. Estimate the mean weight of the pumpkins the farmer grew.

    This dot plot shows the mean weight of 100 samples of eight pumpkins, similar to the one above.

    clipboard_ee9425910f4947eb1f744dff5bc1b2a8e.png
    Figure \(\PageIndex{7}\)

    2. What appears to be the mean weight of the 5,000 pumpkins?

    3. What does the dot plot of the sample means suggest about how accurate an estimate based on a single sample of 8 pumpkins might be?

    4. What do you think the farmer might do to get a more accurate estimate of the population mean?

    (From Unit 8.4.3)


    This page titled 8.4.4: Comparing Populations Using Samples is shared under a CC BY license and was authored, remixed, and/or curated by Illustrative Mathematics.

    • Was this article helpful?