Glossary

Last updated
Save as PDF

Page ID: 162855

Anton Butenko
Mt. San Jacinto College

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Example and Directions
Words (or words that have the same definition)	The definition is case sensitive	(Optional) Image to display with the definition [Not displayed in Glossary, only in pop-up on pages]	(Optional) Caption for Image	(Optional) External or Internal Link	(Optional) Source for Definition
(Eg. "Genetic, Hereditary, DNA ...")	(Eg. "Relating to genes or heredity")		The infamous double helix	https://bio.libretexts.org/	CC-BY-SA; Delmar Larsen

Glossary Entries
Word(s)	Definition	Image	Caption	Link	Source
Statistics	a science that deals with any aspect of the collection, analysis, interpretation, and presentation of data
Data	a collection of observations
Population	the collection of all individuals or items under consideration in a statistical study
Census	a study that involves the entire population
Sampling	a process of obtaining a sample from the population
Sample	a part of the population from which information is obtained
Representative sample	a sample that reflects as closely as possible the relevant characteristics of the population under consideration
Descriptive statistics	the methods for organizing and summarizing information
Inferential statistics	the methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population
Sampling error	the natural variation that results from selecting a sample to represent a larger population
Non-sampling error	an issue that affects the reliability of sampling data other than the natural variation
Observational study	a study in which researchers simply observe characteristics and take measurements
Designed experiment	a study in which the data do not exist until someone does "the experiment" that produces the data
Statistically significant result	a result that is very unlikely to occur by chance
Practically significant result	a result that is big enough to be meaningful in the real world regardless of its statistical significance
Lurking variable	a variable that causes the changes in the two variables under consideration
Sampling bias	a measure of how not representative the sample is due to not all members of the population being equally likely to be selected
Simple random sampling	sampling procedure for which each possible sample of a given size is equally likely to be the one obtained
Sampling with replacement	sampling in which every selected member of the population is returned to the population for the future selection
Sampling without replacement	sampling in which every member of the population may be chosen for inclusion in a sample only once
Systematic sampling	a method for selecting a random sample by randomly picking the first object and then every k-th object after that for some k approximately equal to the number of individuals in the population divided by the desired sample size
Cluster sampling	a method for selecting a random sample by dividing the population into groups and using simple random sampling to select a set of groups from which every object is included in the sample
Stratified sampling	a method for selecting a random sample by dividing the population into strata and then using simple random sampling to pick objects from each stratum so that each stratum is represented in the sample proportionally to its size
Voluntary response sampling	a method of sampling in which the respondents themselves decide whether to be included
Convenience sampling	a nonrandom method of selecting a sample; this method selects individuals that are easily accessible and may result in biased data
Placebo	a fake drug used in the testing of medication
Control group	in an experiment, the group that does not receive the experimental treatment
Experimental group	in an experiment, the group that is exposed to the treatment
Explanatory variable	a variable that we think explains or causes changes in the response variable
Response variable	a variable that measures an outcome or result of a study
Treatment	a specific condition applied to the individuals in an experiment
Blinding	a technique where the subjects do not know whether they are receiving a treatment or a placebo
Double-blinding	a technique where both the subject and data recorder do not know the treatment
Statistical variable	a characteristic that varies from one object of the population to another
Qualitative data	data that consist of names and labels describing the attributes of a population
Quantitative data	data that consist of numbers that are the result of counting or measuring attributes of a population
Categorical data	qualitative data that do not have a natural ordering
Ordinal data	qualitative data that have a natural ordering
Discrete data	quantitative data whose all possible values can be listed
Continuous data	quantitative data whose all possible values form an interval
Nominal level of measurement	a level of measurement using which we can capture the variation in kind or quality but not in amount
Ordinal level of measurement	a level of measurement using which we can yield rank-ordered data
Interval level of measurement	a level of measurement using which we can obtain the meaningful difference between the values but there is no natural zero
Ratio level of measurement	a level of measurement using which we can obtain the meaningful difference between the values and there is a natural zero
Frequency	the number of times a particular distinct value occurs
Frequency distribution table	a listing of the distinct values and their frequencies
Relative frequency	the frequency divided by the total number of observation
Relative frequency distribution table	a listing of the distinct values and their relative frequencies
Bar chart	a chart that displays the distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies) of those values on a vertical axis
Pie chart	a disk divided into wedge-shaped pieces proportional to the relative frequencies of the qualitative data
Pareto chart	a chart that consists of bars that are sorted into order by category size (largest to smallest)
Single-value grouping	a way to group quantitative data into categories in which each category represents a single possible value
Interval grouping	a way to group quantitative data into categories in which each category is an interval of values called a class
Class	a category in which quantitative data can be arranged
Lower class limit	the smallest value that could go in a class
Upper class limit	the smallest value that could go into the next higher class
Class width	the difference between the lower limit of a class and the lower limit of the next-higher class
Class midpoint	the average of the lower limit of a class and the lower limit of the next-higher class
Histogram	a chart that displays the classes of the quantitative data on a horizontal axis and the frequencies (relative frequencies) of those classes on a vertical axis
Dot plot	a chart in which each observation is recorded by placing a dot over the appropriate value on the horizontal axis
Stem-and-leaf diagram	a chart in which each observation is recorded by placing its unit digit in the row with the appropriate quantity for its number of tens
Frequency polygon	graph of a frequency distribution that shows the number of instances of obtained scores, usually with the data points connected by straight lines
Cumulative frequency of a class	the sum of the frequencies for that class and all previous classes
Ogive	a cumulative frequency polygon
Distribution of data	a table, graph, or formula that provides the values of the observations and how often they occur
Shape of the distribution	one of the features of the distribution that characterizes the visual appearance of the distribution
Scatterplot	a chart in which each observation is plotted as a point on the coordinate plane
Time-series	a time-ordered sequence of observations taken at regular intervals
Contingency table	a two-way frequency table used to group bivariate data
Center	the most "typical values" of the data set
Mean	the sum of the observations divided by the number of observations
Median	the number that divides the bottom 50% of the data from the top 50%; same as the 50-th percentile
Mode	the most frequently occurring value in the data set
Resistant measure	a measure that is not sensitive to a few extreme observations
Non-Resistant measure	a measure that is sensitive to a few extreme observations
Parameter	a descriptive measure of the population
Statistic	a descriptive measure of a sample
Estimator	a value that is used to estimate the parameter
Estimating	an attempt to estimate the parameter
Biased estimator	an estimator that doesn't correctly estimate the unknown parameter on average in a long run
Unbiased estimator	an estimator that correctly estimates the unknown parameter on average in a long run
Population mean	the mean of the population
Sample mean	the mean of a sample
Measures of variation	measures that indicate the amount of variation, or spread, in a data set
Range	the difference between the maximum (largest) and minimum (smallest) observations
Variance	the average squared deviation of the data in a dataset from its mean
Standard deviation	the square root of the variance
Population variance	the variance of the population
Sample variance	the variance of a sample
Population standard deviation	the standard deviation of the population
Sample standard deviation	the standard deviation of a sample
Outlier	an observation with an "extreme" deviation from the mean; in the context of linear regression - an observation that lies too far from the regression line, relative to other data points
Three standard deviations rule	the rule according to which almost all the observations in any data set lie within 3 standard deviations to either side of the mean
Chebyshev's Theorem	the rule that gives the approximate distribution of observations within k standard deviations (k>1) of the mean regardless of the shape of the distribution
Empirical rule	the rule that gives the approximate distribution of observations within 1 standard deviation (68%), 2 standard deviations (95%), and 3 standard deviations (99.7%) of the mean when the shape of the distribution can be assumed normal
Zeroth quartile	the value for which 0% of the data is less than it; same as the minimum
First quartile	the value for which 25% of the data is less than it
Second quartile	the value for which 50% of the data is less than it; same as the median
Third quartile	the value for which 75% of the data is less than it
Fourth quartile	the value for which 100% of the data is less than it; same as the maximum
Five-number summary	a numerical summary that consists of the Q0, Q1, Q2, Q3, Q4
Interquartile range	a measure of variability, defined to be the difference between the third and first quartiles
Boxplot	a graphical representation of the five-number summary of a dataset obtained by drawing a box that ranges from Q1 to Q3 and "whiskers" extend to the most extreme observations that still lie within the adjacent values
Probability	a measure of how likely something is to happen
Experiment	an action whose result is not certain
Simple outcome	one possible result of an experiment
Sample space	the set of all outcomes of an experiment
Event	a collection of outcomes from the sample space
Impossible event	an event that never occurs
Certain event	an event that always occurs
(not A)	the event that occurs when A doesn't occur
(A and B)	the event that occurs when both, A and B, occur
(A or B)	the event that occurs when A or B or (A and B) occur
Mutually exclusive events	the events that have no common outcomes
Marginal probability	the probability of an event obtained from the rightmost column or the bottom row of a contingency table
Joint probability	the probabilities of an event obtained from the cell at the intersection of a row and a column
Type 1 error	the event in which false positive occurred; in the context of hypothesis testing - rejecting a true H0 claim
Type 2 error	the event in which false negative occurred; in the context of hypothesis testing - failing to reject a false H0 claim
Conditional probability	the probability of event A knowing that B has occurred
Independent events	events such that the occurrence of one doesn't affect the probability of another
Compliment rule	for any event A, the probability of A not happening is P(not A) = 1 - P(A)
General addition rule	for any two events, A and B, the probability of A or B is P(A or B) = P(A) + P(B) - P(A and B)
Special addition rule	if events A and B are mutually exclusive, then P(A or B) = P(A) + P(B)
General multiplication rule	for any two events, A and B, the probability of A and B is P(A and B) = P(A\|B)P(B)
Special multiplication rule	if events A and B are independent, then P(A and B) = P(A)P(B)
Fair coin	a coin for which the outcomes, heads and tails, are equally likely
Unfair coin	a coin for which the outcomes, heads and tails, are not equally likely
Permutation	an arrangement of distinct objects in order
Combination	a collection of distinct objects
Basic counting principle	the rule that states that when there are m ways to do one thing, and n ways to do another, then altogether there are m×n ways of doing both
Factorial	the product of the natural numbers less than or equal to the given number
Special permutation rule	the rule that states that the number of permutations of n objects is n!
nPr	the number of permutations of r objects chosen from a set of n objects
nCr	the number of combinations of r objects chosen from a set of n objects
Random variable	an unknown quantity whose value depends on chance
Probability distribution table	the table that summarizes the possible values of a discrete random variable and their corresponding probabilities
Probability histogram	a graph that displays the possible values of a discrete random variable on the horizontal axis and their corresponding probabilities on the vertical axis
Discrete random variable	a random variable whose all possible values can be listed
Continuous random variable	a random variable whose all possible values form a continuous interval
Expected value	the average value of the random variable over a large number of experiments
Standard deviation	the square root of the average squared deviation of the values of the random variable from its expected values
Bernoulli trials	a sequence of the same Binomial experiment performed independently n times
Binomial random variable	a discrete random variable that represents the number of successes among n Bernoulli trials
Binomial experiment	an experiment in which there are exactly two possible outcomes - success and failure, with the probability of success P(S)=p and the probability of failure P(F)=1-p
Continuous random variable	a random variable whose all possible values form an interval
Probability density curve	a curve that (1) is always on or above the horizontal axis; (2) has the total area between itself and the horizontal axis equal to 1
Standard normal curve	a bell-shaped probability density curve that (1) has the peak at 0; (2) symmetric about 0; (3) extends indefinitely in both directions, approaching but never touching the horizontal axis; (4) the empirical rule holds
Standard normal variable	a continuous random variable that has the standard normal probability density curve
T-curve	a probability density curve that (1) extends indefinitely in both directions, approaching, but never touching, the horizontal axis; (2) symmetric about 0; (3) as the number of degrees of freedom becomes larger, it increasingly looks like the standard normal curve
Chi-square curve	a probability density curve that (1) starts at 0 on the horizontal axis and extends indefinitely to the right, approaching, but never touching, the horizontal axis; (2) is right-skewed; (3) as the number of degrees of freedom becomes larger, it increasingly looks like a normal curve
F-curve	a probability density curve that (1) starts at 0 on the horizontal axis and extends indefinitely to the right, approaching, but never touching, the horizontal axis; (2) is right-skewed; (3) possess the reciprocal property
Uniform random variable	a random variable whose probability density function is portrayed as a horizontal line 1/(b-a) above the x-axis over the interval from a to b
Normal probability curve	a bell-shaped probability density curve that (1) has the peak at μ; (2) symmetric about μ; (3) extends indefinitely in both directions, approaching but never touching the horizontal axis; (4) empirical rule holds
Normal random variable	a variable that has a normal probability density curve
Normality plot	a plot used to determine whether the population is normal based on the sample of a small size
Normal approximation	a technique used to approximate the probabilities of a binomial random variable using the normal random variable with the same parameters
Correction for continuity	an adjustment made in the binomial approximation to account for the missing probabilities at the end of the range
Sample mean	a random variable whose values are the averages of the samples of size n from the population
Sample proportion	a random variable that represents the number of cases falling into one category of the variable divided by the number of cases in the sample
Sample sum	a random variable that represents the sum of the samples of size n from the population
Sample variance	a random variable whose possible values are the variances of the samples of size n from the population
The Central Limit Theorem for Sample Means	for sufficiently large samples (n>30), the distribution of the sample mean is approximately normal
The Central Limit Theorem for Sample Proportions	for sufficiently large samples (np>10 and n(1-p)>10), the distribution of the sample proportion is approximately normal
The Central Limit Theorem for Sample Sums	for sufficiently large samples (n>30), the distribution of the sample sum is approximately normal
Point estimate	the value of a statistic used to estimate the parameter
Biased estimate	a statistic such that the mean of all its possible values doesn't equal the parameter
Unbiased estimate	a statistic such that the mean of all its possible values equals the parameter
Confidence interval	an interval of numbers obtained from a point estimate of a parameter along with the percentage confidence that the parameter lies in the range
Confidence level	the confidence we have that the parameter lies in the confidence interval (i.e., that the confidence interval contains the parameter)
Confidence estimate	the confidence level and confidence interval
Margin of error	the distance from the center of a confidence interval to the end
Null hypothesis	a to be tested statistical claim in the form of an equation
Alternative hypothesis	a statistical claim in the form of an inequality to be tested as an alternative to the null hypothesis
Hypothesis test	a test to decide whether the null hypothesis should be rejected in favor of the alternative hypothesis or not
Significance level	the largest tolerated probability of type I error
Test statistic	a value that measures how far a particular observation is away from its expected value
Critical value	a value that defines a rejection region for a hypothesis test
P-value	the probability of a statistic to be at least as far from the assumed value as the current observation
Hypothesis	a testable claim, often implied by a theory, which is either true or false
Statistical claim	a statement about a population parameter in the form of an equation or an inequality
Independent samples	a pair of samples in which the observations in one sample do not influence the observations in the other
Dependent samples	a pair of samples in which the observations in one sample somehow influence the observations in the other
Paired samples	a pair of samples in which the observations are paired in a distinct way
Pooled proportion	the proportion obtained by treating the two samples as one
Pooled standard deviation	the standard deviation computed by treating the two samples as one
Goodness-of-fit test	a hypothesis test in which the null hypothesis is "the variable has the specified distribution", and the alternative hypothesis is "the variable doesn't have the specified distribution"
Homogeneity test	a hypothesis test in which the null hypothesis is "the distribution of one variable is the same for each value of the other variable", and the alternative hypothesis is "the distribution of one variable is not the same for all values of the other variable"
Independence test	a hypothesis test in which the null hypothesis is "the variables are independent", and the alternative hypothesis is "the variables are dependent"
Homogeneity	the quality of being similar or comparable in kind or nature
Independent variables	statistical variables such that knowing the outcome of one doesn't affect the probability of another variable's outcome
Uniform discrete distribution	a probability distribution in which the frequencies are evenly spread out across the values of a discrete variable
Categorical variable	a qualitative variable associated with categories
One-Way ANOVA	a hypothesis test in which the null hypothesis is "all the population means are equal", and the alternative hypothesis is "population means are not equal"
Two-Way ANOVA	a hypothesis test that includes two nominal independent variables, regardless of their numbers of levels, and a scale-dependent variable
Coefficient of determination	a numerical measure of the association between two variables
Correlation coefficient	a numerical measure of the strength of the linear relationship between two variables
Extrapolation	using the regression line outside of the domain
Influential observation	an observation whose removal causes the regression equation to change considerably
Explanatory variable, predictor variable, independent variable	a variable used to explain the outcome variable
Outcome variable, response variable, dependent variable	a variable that changes in response to explanatory variable
Predicted value	the output computed using the regression line
Negative association	large values of one variable are associated with large values of the other
Positive association	large values of one variable are associated with small values of the other, and vice versa
Linear relationship	data tend to cluster around straight line when plotted on a scatterplot
Explained difference	the difference between the predicted and the average values
Unexplained difference	the difference between the predicted and the observed values
Residual plot	a plot in which residuals are plotted against the values of the explanatory variable
Least-squares regression line	the line for which the sum of squared vertical distances is as small as possible

Search

Text Color

Text Size

Margin Size

Font Type