There are two types of data that we can collect:
- Qualitative data describes a subject, and cannot be expressed as a number.
- Quantitative data defines a subject and is expressed as a number (it can be quantified) that can be analyzed. There are two types of quantitative data continuous and discrete.
1. Ratings of a tv show
2. Grades of an exam.
3. Marks of an exam.
4. Students heights in a class
There are many types, including:
- Pie charts and bar graphs are used for qualitative data
- Histograms (similar to bar graphs) are used for quantitative data
- Line graphs are used for quantitative data
- Scatter graphs are used for quantitative data
Graphs should contain:
- A descriptive title below the graph or chart
- A caption below the title (optional)
- Axes labelled with the name of variable, units (if applicable) and the variable intervals; intervals must be spaced according to scale
- A legend to indicate which data points belong to which set of data, if more than one data set is displayed
There are various ways to summarize a data set:
- Distribution tables
- Graphs of raw data
- Sample statistics such as mean, median, mode, standard error and standard deviation
- Graphs based on average values with error bars to indicate a standard error or standard deviation
Simple Descriptive Statistics
Descriptive statistics are numbers and processes that describe a group of data. The most common descriptive statistics focus on determining the "average" of the data. However, there is more than one "average," so we must be specific when finding them. Values which describe the “average” are:
- Mean - the sum of all the data divided by the number of datum in the group, the "average" that most people mean (see what we did there?).
- Median - the middle-most datum, when all the data are arranged by the quantity
- Mode - the most common datum in the set
As you can imagine, data sets very rarely are all one value. Thus, we need to describe how the data is arranged around the mean. Values which describe the variation of the data around the mean are:
- Standard deviation describes, for a whole population, the dispersion of the whole population's data set.
- Standard error describes, for a representative sample's data set, the standard deviation of that set.
The Normal Distribution and Other Shapes of Data Distribution
Analysis of many phenomena results in a normal distribution of data. Normal distribution approximates a bell-shaped curve when data is plotted on a line graph. As the number of replicates in a data set increases, the graph approaches a perfect bell shape, so the mean, median and mode are all at the peak of the curve
A distribution may be skewed due to a disproportionate number of extremely high or low values, especially if the sample size is small.
A distribution may show more or less variation, around the mean. This is quantified by the size of the standard deviation of the distribution.
It is important to remember that not all types of data distributions have a single peak.