Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

25.2: O.02: Section 1 Part 1

  • Page ID
    51754
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    Section 1: Compound models—adding two model formulas to model combined effects

    Compound-model application 1: Modeling mixed populations

    In an earlier topic we discussed normal-distribution models, whose graph is a “bell-shaped” curve with a single peak. This model is appropriate in many situations where some characteristic of a group of generally-similar objects is distributed around an average value. But when two groups with different averages are mixed, the resulting distribution may have two partially-overlapping peaks, as illustrated below by a dataset on the height of a combined population of male and female students. A single normal distribution will not do a good job of modeling this data.

    Example 1:

    Height data from a co-educational population

                The data that produced the above graph is shown in a table to the right.   A plausible model for data of this kind is the sum of two normal distributions. This will entail six parameters: total, average, and width for each normal. The formula to be placed into C3 will thus be “=$G$3*NORMDIST(A3,$G$4,$G$5,FALSE)+
    $G$6*NORMDIST(A3,$G$7,$G$8,FALSE)”;   as always, it should be spread down column C beside all the data rows.As usual, we will set the initial values for the parameters to reasonable approximations before using Solver. In this case, we can do that by setting each “total” parameter to 5,000 (half the overall total), setting the “average” parameters to the approximately the x positions of the two peaks (64 and 70 are close enough), and setting both “width” parameters to a value, such as 2 or 3, that gives a model that is reasonably similar to the data. Then use Solver to minimize the sum of squared deviations, answering “G3:G8” in the “By changing cells” entry field so that all six parameters will be used. As shown below, a good fit is found.
    Parameters (two-normal sum)
    5529.128 Total 1
    63.72487 Average 1
    2.685645 Width 1
    4569.312 Total 2
    70.38119 Average 2
    2.728443 Width 2

     

    Goodness of fit of this model
    1710.994 Sum of squared deviations
    7.429222 Standard deviation

    This best-fit model does more than give a formula for describing the height distribution of the combined male-female population. Because it fits the data so well, we can confidently deduce the characteristics of the male and female distributions from the parameters for the two normal distributions, even though we do not have any data that identifies gender, and even though both males and females contribute to each height total. Clearly the first normal describes the height distribution of females and the second normal describes the height distribution of males. Thus we can use these results to see that 45.7% is the answer to the question “How much of this data came from males?”, and that 63.7 inches is the answer to the question “What was the average height of the female part of this sample?”.

    Height Students
    48 0
    49 0
    50 0
    51 0
    52 0
    53 0
    54 0
    55 2
    56 12
    57 29
    58 79
    59 180
    60 326
    61 481
    62 673
    63 813
    64 864
    65 825
    66 751
    67 717
    68 677
    69 700
    70 735
    71 658
    72 568
    73 424
    74 280
    75 167
    76 75
    77 25
    78 13
    79 7
    80 0
    81 1
    82 1
    83 0
    84 0

    If we individually graph the two normal distributions found here along with their sum we can see the components that combined to form the data distribution:

    This same sum-of-two-normal-distributions approach can be used in a variety of situations. Sometimes the component distributions have about the same average but greatly different widths, so that the resultant distribution has “fat tails”. Other times the distributions have substantially different totals but nearby averages, so that the result looks like the larger distribution with a bump added on one side. It is often the case that an investigator is interested in only one of the component distributions, with the other values being ignored after the fitting process. An example of each of these kinds of two-normal situations are given below, with the components shown (thin lines) as well as the sum (dots) that reflects the data that would be observed.

     
    Examples of the sum of two differing-parameter normal-distribution functions

     

    CC licensed content, Shared previously
    • Mathematics for Modeling. Authored by: Mary Parker and Hunter Ellinger. License: CC BY: Attribution