    Statistics a science that deals with any aspect of the collection, analysis, interpretation, and presentation of data        
    Data a collection of observations        
    Population the collection of all individuals or items under consideration in a statistical study        
    Census a study that involves the entire population        
    Sampling a process of obtaining a sample from the population        
    Sample a part of the population from which information is obtained        
    Representative sample a sample that reflects as closely as possible the relevant characteristics of the population under consideration        
    Descriptive statistics the methods for organizing and summarizing information        
    Inferential statistics the methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population        
    Sampling error the natural variation that results from selecting a sample to represent a larger population        
    Non-sampling error an issue that affects the reliability of sampling data other than the natural variation        
    Observational study a study in which researchers simply observe characteristics and take measurements        
    Designed experiment a study in which the data do not exist until someone does "the experiment" that produces the data        
    Statistically significant result a result that is very unlikely to occur by chance        
    Practically significant result a result that is big enough to be meaningful in the real world regardless of its statistical significance        
    Lurking variable a variable that causes the changes in the two variables under consideration        
    Sampling bias a measure of how not representative the sample is due to not all members of the population being equally likely to be selected        
    Simple random sampling sampling procedure for which each possible sample of a given size is equally likely to be the one obtained        
    Sampling with replacement sampling in which every selected member of the population is returned to the population for the future selection        
    Sampling without replacement sampling in which every member of the population may be chosen for inclusion in a sample only once        
    Systematic sampling a method for selecting a random sample by randomly picking the first object and then every k-th object after that for some k approximately equal to the number of individuals in the population divided by the desired sample size        
    Cluster sampling a method for selecting a random sample by dividing the population into groups and using simple random sampling to select a set of groups from which every object is included in the sample        
    Stratified sampling a method for selecting a random sample by dividing the population into strata and then using simple random sampling to pick objects from each stratum so that each stratum is represented in the sample proportionally to its size        
    Voluntary response sampling a method of sampling in which the respondents themselves decide whether to be included        
    Convenience sampling a nonrandom method of selecting a sample; this method selects individuals that are easily accessible and may result in biased data        
    Placebo a fake drug used in the testing of medication        
    Control group in an experiment, the group that does not receive the experimental treatment        
    Experimental group in an experiment, the group that is exposed to the treatment        
    Explanatory variable a variable that we think explains or causes changes in the response variable        
    Response variable a variable that measures an outcome or result of a study        
    Treatment a specific condition applied to the individuals in an experiment        
    Blinding a technique where the subjects do not know whether they are receiving a treatment or a placebo        
    Double-blinding a technique where both the subject and data recorder do not know the treatment        
    Statistical variable a characteristic that varies from one object of the population to another        
    Qualitative data data that consist of names and labels describing the attributes of a population        
    Quantitative data data that consist of numbers that are the result of counting or measuring attributes of a population        
    Categorical data qualitative data that do not have a natural ordering        
    Ordinal data qualitative data that have a natural ordering        
    Discrete data quantitative data whose all possible values can be listed        
    Continuous data quantitative data whose all possible values form an interval        
    Geometric Experiment a statistical experiment with the following properties: (1) There are one or more Bernoulli trials with all failures except the last one, which is a success. (2) In theory, the number of trials could go on forever. There must be at least one trial. (3) The probability, \(p\), of a success and the probability, \(q\), of a failure do not change from trial to trial       OpenStax
    Hypergeometric Experiment a statistical experiment with the following properties: (1) You take samples from two groups. (2) You are concerned with a group of interest, called the first group. (3) You sample without replacement from the combined groups. (4) Each pick is not independent, since sampling is without replacement. (5) You are not dealing with Bernoulli Trials.       OpenStax
    Hypergeometric Probability a discrete random variable (RV) that is characterized by: (1) A fixed number of trials. (2) The probability of success is not the same from trial to trial. We sample from two groups of items when we are interested in only one group. \(X\) is defined as the number of successes out of the total number of items chosen. Notation: \(X \sim H(r, b, n)\), where \(r =\) the number of items in the group of interest, \(b =\) the number of items in the group not of interest, and \(n =\) the number of items chosen.       OpenStax
    Hypothesis a statement about the value of a population parameter, in case of two hypotheses, the statement assumed to be true is called the null hypothesis (notation \(H_{0}\)) and the contradictory statement is called the alternative hypothesis (notation \(H_{a}\)).       OpenStax
    Hypothesis Testing Based on sample evidence, a procedure for determining whether the hypothesis stated is a reasonable statement and should not be rejected, or is unreasonable and should be rejected.       OpenStax
    Independent Events The occurrence of one event has no effect on the probability of the occurrence of another event. Events \(\text{A}\) and \(\text{B}\) are independent if one of the following is true: (1) \(P(\text{A|B}) = P(\text{A})\), (2) \(P(\text{B|A}) = P(\text{B})\), (3) \(P(\text{A AND B}) = P(\text{A})P(\text{B})\)       OpenStax
    Inferential Statistics also called statistical inference or inductive statistics; this facet of statistics deals with estimating a population parameter based on a sample statistic. For example, if four out of the 100 calculators sampled are defective we might infer that four percent of the production is defective.       OpenStax
    Informed Consent Any human subject in a research study must be cognizant of any risks or costs associated with the study. The subject has the right to know the nature of the treatments included in the study, their potential risks, and their potential benefits. Consent must be given freely by an informed, fit participant.       OpenStax
    Institutional Review Board a committee tasked with oversight of research programs that involve human subjects       OpenStax
    Interval also called a class interval; an interval represents a range of data and is used when displaying large data sets       OpenStax
    Level of Significance of the Test probability of a Type I error (reject the null hypothesis when it is true). Notation: \(\alpha\). In hypothesis testing, the Level of Significance is called the preconceived \(\alpha\) or the preset \(\alpha\).       OpenStax
    Lurking Variable a variable that has an effect on a study even though it is neither an explanatory variable nor a response variable       OpenStax
    Mean a number that measures the central tendency; a common name for mean is "average." The term "mean" is a shortened form of "arithmetic mean." By definition, the mean for a sample (denoted by \(\bar{x}\)) is \(\bar{x} = \dfrac{\text{Sum of all values in the sample}}{\text{Number of values in the sample}}\), and the mean for a population (denoted by \(\mu\)) is \(\mu = \dfrac{\text{Sum of all values in the population}}{\text{Number of values in the population}}\).       OpenStax
    Mean of a Probability Distribution the long-term average of many trials of a statistical experiment       OpenStax
    Median a number that separates ordered data into halves; half the values are the same number or smaller than the median and half the values are the same number or larger than the median. The median may or may not be part of the data.       OpenStax
    memoryless property For an exponential random variable \(X\), the memoryless property is the statement that knowledge of what has occurred in the past has no effect on future probabilities. This means that the probability that \(X\) exceeds \(x + k\), given that it has exceeded \(x\), is the same as the probability that \(X\) would exceed \(k\) if we had no knowledge about it. In symbols we say that \(P(X > x + k | X > x) = P(X > k)\)       OpenStax
    Midpoint the mean of an interval in a frequency table       OpenStax
    Mode the value that appears most frequently in a set of data       OpenStax
    Mutually Exclusive Two events are mutually exclusive if the probability that they both happen at the same time is zero. If events \(\text{A}\) and \(\text{B}\) are mutually exclusive, then \(P(\text{A AND B}) = 0\).       OpenStax
    Nonsampling Error an issue that affects the reliability of sampling data other than natural variation; it includes a variety of human errors including poor study design, biased sampling methods, inaccurate information provided by study participants, data entry errors, and poor analysis.       OpenStax
    Normal Distribution a continuous random variable (RV) with pdf \(f(x) = \dfrac{1}{\sigma \sqrt{2 \pi}}e^{\dfrac{-(x-\mu)^{2}}{2 \sigma^{2}}}\), where \(\mu\) is the mean of the distribution and \(\sigma\) is the standard deviation; notation: \(X \sim N(\mu, \sigma)\). If \(\mu = 0\) and \(\sigma = 1\), the RV is called a standard normal distribution.       OpenStax
    Numerical Variable variables that take on values that are indicated by numbers       OpenStax
    One-Way ANOVA a method of testing whether or not the means of three or more populations are equal; the method is applicable if: (1) all populations of interest are normally distributed. (2) the populations have equal standard deviations. (3) samples (not necessarily of the same size) are randomly and independently selected from each population. (4) The test statistic for analysis of variance is the \(F\)-ratio.       OpenStax
    Outcome a particular result of an experiment       OpenStax
    Outlier an observation that does not fit the rest of the data       OpenStax
    p-value the probability that an event will happen purely by chance assuming the null hypothesis is true. The smaller the \(p\)-value, the stronger the evidence is against the null hypothesis.       OpenStax
    Paired Data Set two data sets that have a one to one relationship so that: (1)both data sets are the same size, and (2) each data point in one data set is matched with exactly one point from the other set.       OpenStax
    Parameter a number that is used to represent a population characteristic and that generally cannot be determined easily       OpenStax
    Placebo an inactive treatment that has no real effect on the explanatory variable       OpenStax
    Point Estimate a single number computed from a sample and used to estimate a population parameter       OpenStax
    Poisson distribution If there is a known average of \(\lambda\) events occurring per unit time, and these events are independent of each other, then the number of events \(X\) occurring in one unit of time has the Poisson distribution. The probability of k events occurring in one unit time is equal to \(P(X = k) = \dfrac{\lambda^{k}e^{-\lambda}}{k!}\).       OpenStax
    Poisson Probability Distribution a discrete random variable (RV) that counts the number of times a certain event will occur in a specific interval; characteristics of the variable: (1) The probability that the event occurs in a given interval is the same for all intervals. (2) The events occur with a known mean and independently of the time since the last event. The distribution is defined by the mean \(\mu\) of the event in the interval. Notation: \(X \sim P(\mu)\). The mean is \(\mu = np\). The standard deviation is \(\sigma = \sqrt{\mu}\). The probability of having exactly \(x\) successes in \(r\) trials is \(P(X = x) = \left(e^{-\mu}\right)\frac{\mu^{x}}{x!}\). The Poisson distribution is often used to approximate the binomial distribution, when \(n\) is “large” and \(p\) is “small” (a general rule is that \(n\) should be greater than or equal to 20 and \(p\) should be less than or equal to 0.05).       OpenStax
    Pooled Proportion estimate of the common value of \(p_{1}\) and \(p_{2}\).       OpenStax
    Population all individuals, objects, or measurements whose properties are being studied       OpenStax
    Probability a number between zero and one, inclusive, that gives the likelihood that a specific event will occur       OpenStax
    Probability a number between zero and one, inclusive, that gives the likelihood that a specific event will occur; the foundation of statistics is given by the following 3 axioms (by A.N. Kolmogorov, 1930’s): Let \(S\) denote the sample space and \(A\) and \(B\) are two events in S. Then: (1) \(0 \leq P(\text{A}) \leq 1\), (2) If \(\text{A}\) and \(\text{B}\) are any two mutually exclusive events, then \(\text{P}(\text{A OR B}) = P(\text{A}) + P(\text{B})\) and (3) \(P(\text{S}) = 1\).       OpenStax
    Probability Distribution Function (PDF) a mathematical description of a discrete random variable (RV), given either in the form of an equation (formula) or in the form of a table listing all the possible outcomes of an experiment and the probability associated with each outcome.       OpenStax
    Proportion the number of successes divided by the total number in the sample       OpenStax
    Qualitative Data See Data.       OpenStax
    Quantitative Data See Data.       OpenStax
    Random Assignment the act of organizing experimental units into treatment groups using random methods       OpenStax
    Random Sampling a method of selecting a sample that gives every member of the population an equal chance of being selected.       OpenStax
    Random Variable (RV) a characteristic of interest in a population being studied; common notation for variables are upper case Latin letters \(X, Y, Z\),...; common notation for a specific value from the domain (set of all possible values of a variable) are lower case Latin letters \(x\), \(y\), and \(z\). For example, if \(X\) is the number of children in a family, then \(x\) represents a specific integer 0, 1, 2, 3,.... Variables in statistics differ from variables in intermediate algebra in the two following ways. (1) The domain of the random variable (RV) is not necessarily a numerical set; the domain may be expressed in words; for example, if \(X =\) hair color then the domain is {black, blond, gray, green, orange}. (2) We can tell what specific value \(x\) the random variable \(X\) takes only after performing the experiment       OpenStax
    Relative Frequency the ratio of the number of times a value of the data occurs in the set of all outcomes to the number of all outcomes to the total number of outcomes       OpenStax
    Representative Sample a subset of the population that has the same characteristics as the population       OpenStax
    Response Variable the dependent variable in an experiment; the value that is measured for change at the end of an experiment       OpenStax
    Sample a subset of the population studied       OpenStax
    Sample Space the set of all possible outcomes of an experiment       OpenStax
    Sampling Bias not all members of the population are equally likely to be selected       OpenStax
    Sampling Distribution Given simple random samples of size \(n\) from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution.       OpenStax
    Sampling Error the natural variation that results from selecting a sample to represent a larger population; this variation decreases as the sample size increases, so selecting larger samples reduces sampling error.       OpenStax
    Sampling with Replacement Once a member of the population is selected for inclusion in a sample, that member is returned to the population for the selection of the next individual.       OpenStax
    Sampling without Replacement A member of the population may be chosen for inclusion in a sample only once. If chosen, the member is not returned to the population before the next selection.       OpenStax
    Simple Random Sampling a straightforward method for selecting a random sample; give each member of the population a number. Use a random number generator to select a set of labels. These randomly selected labels identify the members of your sample.       OpenStax
    Skewed used to describe data that is not symmetrical; when the right side of a graph looks “chopped off” compared the left side, we say it is “skewed to the left.” When the left side of the graph looks “chopped off” compared to the right side, we say the data is “skewed to the right.” Alternatively: when the lower values of the data are more spread out, we say the data are skewed to the left. When the greater values are more spread out, the data are skewed to the right.       OpenStax
    Standard Deviation a number that is equal to the square root of the variance and measures how far data values are from their mean; notation: s for sample standard deviation and σ for population standard deviation.       OpenStax
    Standard Deviation of a Probability Distribution a number that measures how far the outcomes of a statistical experiment are from the mean of the distribution       OpenStax
    Standard Error of the Mean the standard deviation of the distribution of the sample means, or \(\dfrac{\sigma}{\sqrt{n}}\).       OpenStax
    Standard Normal Distribution a continuous random variable (RV) \(X \sim N(0, 1)\); when \(X\) follows the standard normal distribution, it is often noted as \(Z \sim N(0, 1)\.       OpenStax
    Statistic a numerical characteristic of the sample; a statistic estimates the corresponding population parameter.       OpenStax
    Stratified Sampling a method for selecting a random sample used to ensure that subgroups of the population are represented adequately; divide the population into groups (strata). Use simple random sampling to identify a proportionate number of individuals from each stratum.       OpenStax
    Student's t-Distribution investigated and reported by William S. Gossett in 1908 and published under the pseudonym Student. The major characteristics of the random variable (RV) are: (1) It is continuous and assumes any real values. (2) The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution. (3) It approaches the standard normal distribution as \(n\) gets larger. (4) There is a "family" of \(t\)-distributions: every representative of the family is completely defined by the number of degrees of freedom which is one less than the number of data items.       OpenStax
    Systematic Sampling a method for selecting a random sample; list the members of the population. Use simple random sampling to select a starting point in the population. Let k = (number of individuals in the population)/(number of individuals needed in the sample). Choose every kth individual in the list starting with the one that was randomly selected. If necessary, return to the beginning of the population list to complete your sample.       OpenStax
    The AND Event An outcome is in the event \(\text{A AND B}\) if the outcome is in both \(\text{A AND B}\) at the same time.       OpenStax
    The Complement Event The complement of event \(\text{A}\) consists of all outcomes that are NOT in \(\text{A}\).       OpenStax
    The Conditional Probability of A GIVEN B \(P(\text{A|B})\) is the probability that event \(\text{A}\) will occur given that the event \(\text{B}\) has already occurred.       OpenStax
    The Conditional Probability of One Event Given Another Event P(A|B) is the probability that event A will occur given that the event B has already occurred.       OpenStax
    The Law of Large Numbers As the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency probability approaches zero.       OpenStax
    The Or Event An outcome is in the event \(\text{A OR B}\) if the outcome is in \(\text{A}\) or is in \(\text{B}\) or is in both \(\text{A}\) and \(\text{B}\).       OpenStax
    The OR of Two Events An outcome is in the event A OR B if the outcome is in A, is in B, or is in both A and B.       OpenStax
    Treatments different values or components of the explanatory variable applied in an experiment       OpenStax
    Tree Diagram the useful visual representation of a sample space and events in the form of a “tree” with branches marked by possible outcomes together with associated probabilities (frequencies, relative frequencies)       OpenStax
    Type 1 Error The decision is to reject the null hypothesis when, in fact, the null hypothesis is true.       OpenStax
    Type 2 Error The decision is not to reject the null hypothesis when, in fact, the null hypothesis is false.       OpenStax
    Uniform Distribution a continuous random variable (RV) that has equally likely outcomes over the domain, \(a < x < b\); it is often referred as the rectangular distribution because the graph of the pdf has the form of a rectangle. Notation: \(X \sim U(a,b)\). The mean is \(\mu = \frac{a+b}{2}\) and the standard deviation is \(\sigma = \sqrt{\frac{(b-a)^{2}}{12}}\). The probability density function is \(f(x) = \frac{1}{b-a}\) for \(a < x < b\) or \(a \leq x \leq b\). The cumulative distribution is \(P(X \leq x) = \frac{x-a}{b-a}\).       OpenStax
    Variable a characteristic of interest for each person or object in a population       OpenStax
    Variable (Random Variable) a characteristic of interest in a population being studied. Common notation for variables are upper-case Latin letters \(X, Y, Z,\)... Common notation for a specific value from the domain (set of all possible values of a variable) are lower-case Latin letters \(x, y, z,\).... For example, if \(X\) is the number of children in a family, then \(x\) represents a specific integer 0, 1, 2, 3, .... Variables in statistics differ from variables in intermediate algebra in the two following ways. (1) The domain of the random variable (RV) is not necessarily a numerical set; the domain may be expressed in words; for example, if \(X =\) hair color, then the domain is {black, blond, gray, green, orange}. (2) We can tell what specific value x of the random variable \(X\) takes only after performing the experiment.       OpenStax
    Variance mean of the squared deviations from the mean; the square of the standard deviation. For a set of data, a deviation can be represented as \(x - \bar{x}\) where \(x\) is a value of the data and \(\bar{x}\) is the sample mean. The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and one.       OpenStax
    Venn Diagram the visual representation of a sample space and events in the form of circles or ovals showing their intersections       OpenStax
    z-score the linear transformation of the form \(z = \dfrac{x-\mu}{\sigma}\); if this transformation is applied to any normal distribution \(X \sim N(\mu, \sigma\) the result is the standard normal distribution \(Z \sim N(0,1)\). If this transformation is applied to any specific value \(x\) of the RV with mean \(\mu\) and standard deviation \(\sigma\), the result is called the \(z\)-score of \(x\). The \(z\)-score allows us to compare data that are normally distributed but scaled differently.       OpenStax
