Skip to main content
Mathematics LibreTexts

5.2: Data Type and Study Type

  • Page ID
    206489
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Types of Data

    Once we have gathered data, we might wish to classify it. Roughly speaking, data can be classified into two categories: qualitative data (also known as categorical data) and quantitative data.

    Definition: Qualitative and Quantitative Data

    Categorical (qualitative) data are pieces of information that allow us to classify the objects under investigation into various categories.

    Quantitative data are responses that are numerical in nature and with which we can perform meaningful arithmetic calculations.

    Example \(\PageIndex{1}\): Identify the Data Type
    1. We might conduct a survey to determine the favorite movie that each person in a math class has seen in a movie theater.
    2. A survey could ask the number of movies they have seen in a movie theater in the past \(12\) months (\(0, 1, 2, 3, 4, ...\))
    3. Suppose we gather respondents' ZIP codes in a survey to track their geographical location.
    Answer
    1. When we conduct such a survey, the responses would look like: Finding Nemo, The Avengers, or Dunkirk. We might count the number of people who give each answer, but the answers themselves do not have any numerical values: we cannot perform computations with an answer like "Finding Nemo." This would be categorical data.
    2. This would be quantitative data. Other examples of quantitative data would be the running time of the movie you saw most recently (\(104\) minutes, \(137\) minutes, \(104\) minutes, ...) or the amount of money you paid for a movie ticket the last time you went to a movie theater (\(\$5.50, \$7.75, \$9, ...\)). ZIP codes are numbers, but we can't do any meaningful mathematical calculations with them (it doesn't make sense to say that \(98036\) is "twice" \(49018\) — that's like saying that Lynnwood, WA is "twice" Battle Creek, MI, which doesn't make sense at all), so ZIP codes are really categorical data.
    Your Turn \(\PageIndex{1}\): Classify the Data

    Bias in Statistical Study

    Consider the scenario. Suppose we are hired by a politician to determine the amount of support he has among the electorate, should he decide to run for another term. What population should we study? Every person in the district? Not every person is eligible to vote, and regardless of how strongly someone likes or dislikes the candidate, they don’t have much to do with his being re-elected if they are not able to vote.

    What about eligible voters in the district? That might be better, but if someone is eligible to vote but does not register by the deadline, they won’t have any say in the election either. What about registered voters? Many people are registered but choose not to vote. What about "likely voters?"

    This is the criterion used in much political polling; however, it can be challenging to define a "likely voter." Is it someone who voted in the last election? In the last general election? In the last presidential election? Should we consider someone who just turned \(18\) a "likely voter?" They weren’t eligible to vote in the past, so how do we judge the likelihood that they will vote in the next election?

    In November \(1998\), former professional wrestler Jesse "The Body" Ventura was elected governor of Minnesota. Up until right before the election, most polls showed he had little chance of winning. There were several contributing factors to the polls not reflecting the actual intent of the electorate:

    • Ventura was running on a third-party ticket, and most polling methods are better suited to a two-candidate race.
    • Many respondents to polls may have been embarrassed to tell pollsters that they were planning to vote for a professional wrestler.
    • The mere fact that the polls showed Ventura had little chance of winning might have prompted some people to vote for him in protest, sending a message to the major-party candidates.

    But one of the major contributing factors was that Ventura recruited a substantial amount of support from young people, particularly college students, who had never voted before and who registered specifically to vote in the gubernatorial election. The polls did not deem these young people likely voters (since, in most cases, young people have a lower rate of voter registration and a turnout rate for elections), and so the polling samples were subject to sampling bias: they omitted a portion of the electorate that was weighted in favor of the winning candidate.

    Identifying the population can be a challenging task, but once it is identified, how do we choose an appropriate sample? Remember, although we would prefer to survey all members of the population, this is usually impractical unless the population is very small, so we choose a sample. There are many ways to sample a population, but one key goal we must keep in mind is that we want the sample to be representative of the population.

    There are a number of ways that a study can be ruined before you even start collecting data. The first one we have already explored – sampling or selection bias, which is when the sample is not representative of the population. One example of this is voluntary response bias, which occurs when data is collected only from individuals who volunteer to participate. This is not the only potential source of bias.

    Sources of Bias

    • Sampling bias – when the sample is not representative of the population
    • Voluntary response bias – the sampling bias that often occurs when the respondents in the sample volunteered to participate
    • Self-interest study – bias that can occur when the researchers have an interest in the outcome
    • Response bias – when the respondent gives inaccurate responses for any reason
    • Perceived lack of anonymity – when the respondent fears giving an honest answer might negatively affect them
    • Loaded questions – when the question wording influences the responses
    • Non-response bias – when people refusing to participate in the study can influence the validity of the outcome
    Example \(\PageIndex{1}\): Identify the Bias

    Identify a potential source of bias for the study.

    1. A recent study found that chewing gum may raise math grades in teenagers [1]. This study was conducted by the Wrigley Science Institute, a branch of the Wrigley chewing gum company. Identify a potential source of bias for the study.
    2. A survey asks people, “When was the last time you visited your doctor?”
    3. A survey asks participants a question about their interactions with individuals from other racial backgrounds.
    4. An employer puts out a survey asking their employees if they have a drug abuse problem and need treatment help.
    5. A survey asks, “Do you support funding research of alternative energy sources to reduce our reliance on high-polluting fossil fuels?”
    6. A telephone poll asks the question “Do you often have time to relax and read a book?”, and 50% of the people called refused to answer the survey.
    Answer
    1. This is an example of a self-interest study, one in which the researchers have a vested interest in the study's outcome. While this does not necessarily indicate that the study was biased, it certainly suggests that we should subject it to extra scrutiny.
    2. This survey may be subject to response bias, as many individuals may not recall the exact date of their last doctor's visit and provide inaccurate responses.
    3. Here, a perceived lack of anonymity could influence the outcome. The respondent might not want to be perceived as racist, even if they are, and give an untruthful answer.
    4. Here, both response bias and perceived lack of anonymity may be sources of bias. Answering truthfully might have consequences; responses may not be accurate if employees do not feel their answers are anonymous or fear retribution from their employer.
    5. Loaded questions can be intentionally created by pollsters with an agenda or accidentally occur due to poor question wording. Additionally, a concern is the question order, as the sequence of questions alters the results. A psychology researcher provides an example [2].
    6. It is unlikely that the results will be representative of the entire population. This is an example of non-response bias, which is introduced when people refuse to participate in a study or drop out of an experiment. When people refuse to participate, we can no longer be so certain that our sample is representative of the population.
    Your Turn \(\PageIndex{2}\): Identify the Bias

    Type of Studies

    Observational studies – studies in which conclusions are drawn from observations of a sample or the population. In some cases, these observations might be unsolicited, such as studying the percentage of cars that turn right at a red light even when there is a “no turn on red” sign. In other cases, the observations are solicited, like in a survey or a poll.

    In contrast, experiments are commonly used when exploring how subjects react to an external influence. In an experiment, a treatment is applied to the subjects, and the results are measured and recorded.

    Definition: Observational and Experimental

    An observational study is a type of statistical study in which researchers observe and measure variables without changing or influencing the subjects.

    An experimental study involves manipulating variables and assigning treatments to determine cause and effect.

    Example \(\PageIndex{3}\): Observational Study

    Specific research questions: Do the majority of college students listen to music while they study? Do the majority of college students believe that listening to music improves their learning?

    Answer

    To investigate these questions, the statistics students conduct a survey in their other classes. They ask these two questions:

    Do you listen to music while you study? Do you think listening to music improves your concentration and memory?

    This observational study aims to address two questions about a population of college students. Each question contains a claim about the population of college students. We can use the data from this study to determine if these claims are accurate. But data from this study cannot provide evidence of a cause-and-effect relationship between listening to music while studying and improvements in learning

    Example \(\PageIndex{4}\): Experimental Study

    Specific research question: Does listening to music improve students’ ability to quickly identify information?

    Answer

    To investigate this question, the instructor uses word-search puzzles. She divides the class into two groups. Students on one side of the room do a word puzzle for 3 minutes while listening to music on an iPod. Students on the other side of the room do a word puzzle for 3 minutes without music. The instructor calculates the average number of words found by each group.

    This study is an experiment. The instructor manipulates music to investigate the impact on puzzle completion. Data from this study can provide evidence of a cause-and-effect relationship between listening to music while studying and improvements in learning. But the improvement in learning is more narrowly defined as the ability to quickly identify information, such as words in a puzzle

    Your Turn \( \PageIndex{4} \): Observational or Experimental?

    When conducting experiments, it is essential to isolate the treatment being tested.

    Suppose a middle school (junior high) finds that its students are not scoring well on the state’s standardized math test. They decide to run an experiment to see if an alternate curriculum would improve scores. To run the test, they hire a math specialist to come in and teach a class using the new curriculum. To their delight, they see an improvement in test scores.

    The difficulty with this scenario is that it is not clear whether the curriculum is responsible for the improvement or whether the improvement is due to a math specialist teaching the class. This is called confounding – when it is not clear which factor or factors caused the observed effect. Confounding is the downfall of many experiments, although it is sometimes hidden.

    Definition: Confounding

    Confounding occurs when there are two potential variables that could have caused the outcome, and it is not possible to determine which actually caused the result.

    A drug company study about a weight loss pill might report that people lost an average of 8 pounds while using their new drug. However, in the fine print, you find a statement saying that participants were encouraged to also diet and exercise. In this case, it is unclear whether the weight loss is due to the pill, diet, exercise, or a combination of these factors. In this case, confounding has occurred.

    Your Turn \( \PageIndex{5} \): Identify Confounding Variables

    Several measures can be implemented to reduce the likelihood of confounding. The primary measure is to use a control group.

    Control Group and Treatment Group

    The treatment group is the group that receives the treatment or factor being tested.

    Example: If a study is testing a new medicine, the people who receive the medicine are in the treatment group.

    The control group is the group that does not receive the treatment. It is used as a baseline for comparison.

    Example: In the same study, people who receive a placebo (or no medicine) are in the control group. The treatment group is the group that receives the treatment or factor being tested.
    Flowchart illustrating an experimental study with two groups: Treatment Group (receives treatment) and Control Group (does not receive treatment).

    Figure \(\PageIndex{1}\): Type of Experimental Study

    Ideally, the groups are as similar as possible, isolating the treatment as the only potential source of difference between the groups. For this reason, the method of dividing groups is important. Some researchers attempt to ensure that the groups have similar characteristics (the same number of females, the same number of people over 50, etc.), but it is nearly impossible to control for every characteristic. Due to this, random assignment is widely used.

    Sometimes not giving the control group anything does not completely control for confounding variables. For example, suppose a medicine study is testing a new headache pill by giving the treatment group the pill and the control group nothing. If the treatment group showed improvement, we would not know whether it was due to the medicine in the pill, or a response to having taken any pill. This is called a placebo effect.

    In some cases, it is more appropriate to compare to a conventional treatment than a placebo. For example, in a cancer research study, it would not be ethical to deny any treatment to the control group or to give a placebo treatment. In this case, the currently accepted medicine would be given to the control group, sometimes referred to as a comparison group.

    If participants know they are receiving a placebo, it defeats the purpose of using it. So there is blinding in the experimental study.

    Single Blind or Double Blind Study

    A single-blind study is one in which the participant is unaware whether they are receiving the treatment or a placebo. A double-blind study is one in which both the participants and the experimenter are unaware of who is in the treatment group and who is in the control group.

    A flowchart illustrating "Blinding" with two branches: "Single Blind" and "Double Blind," each with brief definitions.
    Figure \(\PageIndex{2}\): Type of Blinding
    Definition: Placebo

    A placebo is a dummy treatment given to control for the placebo effect. An experiment that gives the control group a placebo is called a placebo-controlled experiment.

    In a study for a new medicine that is dispensed in a pill form, a sugar pill could be used as a placebo.

    In a study on the effect of alcohol on memory, a non-alcoholic beer might be given to the control group as a placebo.

    In a study of a frozen meal diet plan, the treatment group would receive the diet food, and the control group could be given standard frozen meals stripped of their original packaging.

    Definition: Placebo Effect

    The placebo effect occurs when the effectiveness of a treatment is influenced by the patient’s perception of how effective they believe the treatment will be, resulting in a positive outcome even if the treatment is ineffective.

    1. To determine if a two-day prep course would help high school students improve their scores on the SAT test, a group of students was randomly divided into two subgroups.

      The first group, the treatment group, received a two-day preparatory course. The second group, the control group, was not given the prep course. Afterwards, both groups were given the SAT.

    2. In a study about anti-depression medication, you would not want the psychological evaluator to know whether the patient is in the treatment or control group, either, as it might influence their evaluation; therefore, the experiment should be conducted as a double-blind study.
    3. A study found that when doing painful dental tooth extractions, patients told they were receiving a strong painkiller, while actually receiving a saltwater injection, found as much pain relief as patients receiving a dose of morphine.[1]
    4. If a researcher is testing whether a new fabric can withstand fire, she simply needs to torch multiple samples of the fabric – there is no need for a control group.
    Example \(\PageIndex{6}\): Blind or Double-Blind?

    To test a new lie detector, two groups of subjects are given the new test. One group is asked to answer all the questions truthfully, and the second group is asked to lie on one set of questions. The person administering the lie detector test does not know what group each subject is in.

    Does this experiment have a control group? Is it blind, double-blind, or neither?

    Answer

    The truth-telling group could be considered the control group, but in reality, both groups are treatment groups here, as it is essential for the lie detector to accurately identify lies and not misclassify truth-telling as lying. This study is double-blinded, as the person administering the test is unaware of which group each subject belongs to.

    Your Turn \(\PageIndex{6}\): Identify the Component of Experimental Study

    Notes:

    3. Levine JD, Gordon NC, Smith R, Fields HL. (1981) Analgesic responses to morphine and placebo in individuals with postoperative pain. Pain. 10:379-89.

    1. References (10)
    2. www.whitehouse.gov/sites/defa...s/hist07z1.xls
    3. [1] Levine JD, Gordon NC, Smith R, Fields HL. (1981) Analgesic responses to morphine and placebo in individuals with postoperative pain. Pain. 10:379-89.
    4. [1] Reuters. news.yahoo.com/s/nm/20090423/...k_gum_learning. Retrieved 4/27/09

      [2] Swartz, Norbert. www.umich.edu/~newsinfo/MT/01...01/mt6f01.html. Retrieved 3/31/2009

    Contributors and Attributions


    5.2: Data Type and Study Type is shared under a CC BY license and was authored, remixed, and/or curated by LibreTexts.