$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 6: Introduction to Statistics

• • Contributed by Pamini Thangarajah
• Professor (Mathematics & Computing) at Mount Royal University
$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

Course Goals and Anticipated Outcomes for This Chapter:

Develop the students:

• ability to study data
• statistical reasoning

Statistics is the science of collecting, analyzing, and drawing conclusions from data. There are two branches of statistics.

### Branches of Statistics

Definition: Descriptive statistics

Descriptive statistics is the branch of statistics used in describing the data via graphs, tables or other statistical measures. This is the type of statistics we do when we have a lot of data and want to summarize it appropriately.

Definition: inferential statistics

Inferential statistics is the branch of statistics that deals with inferring/estimating population characteristics from sample data. If a sample represents a given population accurately, then analyzing the sample can lead to significant conclusions about the population as a whole.

### Terminology

Definition: Population

The population is the entire group that a statistical sample is drawn from. If an accurate sample is drawn, significant hypotheses can be developed with reference to the entire population.

Definition: sample

A sample is a set of data collected from a specific population by a defined procedure.

### Common Sampling Techniques

• Simple random sampling
• Systematic sampling
• Convenience sampling, which is poorly named
• Stratified sampling

Note that one could use a combination of these methods – for example, stopping every 5th person to participate in a survey is systematic and convenience sampling and all of these methods depend on randomness in some way or another!

### Bias

A study suffers from bias if its design or conduct tends to favour certain results. It can happen as a result of failing to choose a truly representative sample (selection bias/participation bias). Bias might also be present if the person conducting the study is biased (by having a personal stake in the study, by having strong beliefs or expectations on the subject), even if their bias is subconscious. It can even happen at the end of the study if the data is intentionally or unintentionally distorted to lead to a particular conclusion. Finally, there could be a flaw in the study conduct, resulting from a systematic measuring error for example.

### Placebos

In an experiment, a placebo is a ‘phony treatment’ that is often given to the control group. On the surface (to a patient and possibly data collector) it appears identical to the treatment under study but it is missing the ‘active ingredient’ understudy. The placebo effect refers to the situation when patients improve just because they believe they are receiving a useful treatment.

### Confounding Variables

Variables that are not intended to be part of a study that confound (confuse) a study’s results are called confounding variables.

### Surveys and Opinion Polls

They are a type of observational study with their own special issues one should be aware of. Margins of error, confidence intervals, and confidence levels are often reported with opinion polls and survey results.

Definitions: margin of error, confidence interval, and confidence level

The margin of error is a number or percentage which should be added and subtracted from the reported number in order to provide a range of numbers in which the actual number probably resides. This range of numbers is called the confidence interval. Margins of error and confidence intervals are always calculated with respect to a particular confidence level. Usually, the confidence level used is 95% (19/20). This means that we can be 95% confident that the confidence interval contains the correct value. The margin of error for 95% confidence is approximately equal to $$\frac{1}{ \surd \left( N \right)}$$, where N is the size of the sample. As N increases, the margin of error decreases