Skip to main content
Mathematics LibreTexts

13.1: The Goodness-of-Fit Test

  • Page ID
    105874
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Next, we will learn how to apply the goodness-of-fit test to test a statistical claim about the distribution of a categorical variable.

    The table below shows the distribution of ER visits across the days of the week in a random sample of 434 visits at a random local hospital in a large city:

    Days

    Sun

    Mon

    Tue

    Wed

    Thu

    Fri

    Sat

    Visits

    40

    50

    63

    77

    77

    78

    49

    Use \(10\%\) significance level to test the claim that the ER visits are evenly distributed across the days of the week.

    Now, let’s identify the statistical claim that needs to be tested:

    “the ER visits are evenly distributed across the days of the week”

    What exactly does it mean to be evenly or uniformly distributed? If that was true, we would expect all days to have the same number of visits! The total is 434, and to have a total of 434 while uniformly distributed would require a 62 visits per day. As a result, we expect the following outcomes:

    Days

    Sun

    Mon

    Tue

    Wed

    Thu

    Fri

    Sat

    Total

    Visits

    62

    62

    62

    62

    62

    62

    62

    434

    This table is called the expected (E) distribution, and what we obtained from the sample is called the observed (O) distribution.

    Let’s check if all necessary assumptions are satisfied:

    • Simple random sample
    • All expected frequencies are 1 or greater
    • At most 20% of the expected frequencies are less than 5

    We will use the following template to perform the hypothesis testing:

    In step 1, we will set up the hypothesis:

    The null and alternative hypotheses are always the same in a goodness-of-fit test. The null hypothesis is that the variable follows a specific distribution, and the alternative hypothesis is that the variable does not follow the specific distribution. In our case, the variable is the days on which an ER was visited, and the type of the distribution is uniform. So, the null hypothesis is that the number of visits to ER is uniformly distributed, and the alternative hypothesis is that the number of visits is not uniformly distributed! The goodness of fit test is always(!) right-tail.

    \(H_0\): the number of ER visits has a uniform distribution

    \(H_a\): the number of ER visits doesn’t have a uniform distribution

    RT

    In step 2, we will identify the significance level:

    The significance level can always be found in the text of the problem. In our case it is 10%, thus:

    \(\alpha=0.10\)

    In step 3, we will find the test statistic using the formula:

    \(\chi^2_0=\sum\frac{(O-E)^2}{E}\)

    \(O\)

    \(E\)

    \(O-E\)

    \((O-E)^2\)

    \(\frac{(O-E)^2}{E}\)

    \(40\)

    \(62\)

    \(40-62=-22\)

    \((-22)^2=484\)

    \(\frac{484}{62}=7.806\)

    \(50\)

    \(62\)

    \(50-62=-12\)

    \(12^2=144\)

    \(\frac{144}{62}=2.323\)

    \(63\)

    \(62\)

    \(63-62=1\)

    \(1^2=1\)

    \(\frac{1}{62}=0.016\)

    \(77\)

    \(62\)

    \(77-62=15\)

    \(15^2=225\)

    \(\frac{225}{62}=3.629\)

    \(77\)

    \(62\)

    \(77-62=15\)

    \(15^2=225\)

    \(\frac{225}{62}=3.629\)

    \(78\)

    \(62\)

    \(79-62=16\)

    \(16^2=256\)

    \(\frac{256}{62}=4.129\)

    \(49\)

    \(62\)

    \(49-62=-13\)

    \(13^2=169\)

    \(\frac{169}{62}=2.726\)

    \(434\)

    \(434\)

     

    \(\chi^2_0=\)

    \(24.26\)

    In step 4, we will perform either the critical value approach or p-value approach to test the claim:

    • In critical value approach, we construct the rejection region using the \(X^2\)-curve with \(df=c-1=7-1=6\), where \(c=7\) is the number of categories:

    RR: greater than \(\chi^2_{0.10}=10.645\)

    • In p-value approach, we compute the p-value:

    P-Value: \(P(X^2>24.26)=0.0005\)

    In step 5, we will draw the conclusion:

    • In the critical value approach, we must check whether the test statistic is in the rejection region or not. Our test statistic is \(24.26\) and it is to the right of the critical value \(10.645\), thus it is in the rejection region.
    • In the p-value approach, we must check whether the p-value is less than the significance level or not. Our p-value is \(0.0005\) and it is less than \(\alpha=0.10\).

    Both tests suggest that we DO reject the null hypothesis in favor of the alternative.

    In step 6, we will interpret the results:

    Under \(10\%\) significance level we have sufficient evidence to suggest that the ER visits are not uniformly distributed throughout a week.

    We discussed how to apply the \(X^2\) Goodness-of-Fit Procedure to test a statistical claim whether a variable follows a specific distribution or not.


    13.1: The Goodness-of-Fit Test is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?