2.2: Location of Center
The center of a population is very important. This describes where you expect to find values. If you know that you expect to make $50,000 annually when you graduate from college and are employed in your field of study, then that is the location of the center. It does not mean everyone will make that amount. It just means that you will make around that amount. The location of center is also known as the average. There are three types of averages—mean, median, and mode.
Mean: The mean is the type of average that most people commonly call “the average.” You take all of the data values, find their sum, and then divide by the number of data values. Again, you will be using the sample statistic to estimate the population parameter, so we need formulas and symbols for each of these.
Population Mean:
\[\mu=\frac{x_{1}+x_{2}+\cdots+x_{N}}{N}=\frac{\sum x}{N} \nonumber \]
where \(N\) = size of the population
\(x_{1}+x_{2}+\cdots+x_{N}\) are data values
Note: \(\sum x = x_{1}+x_{2}+\cdots+x_{N}\) is a short cut way to write adding a bunch of numbers together
Sample Mean:
\[\bar{x}=\frac{x_{1}+x_{2}+\cdots+x_{N}}{n}=\frac{\sum x}{n} \nonumber \]
where \(n\) = size of the sample
\(x_{1}+x_{2}+\cdots+x_{N}\) are data values
Note: \(\sum x = x_{1}+x_{2}+\cdots+x_{N}\) is a short cut way to write adding a bunch of numbers together
Median: This is the value that is found in the middle of the ordered data set.
Most books give a long explanation of how to find the median. The easiest thing to do is to put the numbers in order and then count from both sides in, one data value at a time, until you get to the middle. If there is one middle data value, then that is the median. If there are two middle data values, then the median is the mean of those two data values. If you have a really large data set, then you will be using technology to find the value. There is no symbol or formula for median, neither population nor sample.
Mode: This is the data value that occurs most often.
The mode is the only average that can be found on qualitative variables, since you are just looking for the data value with the highest frequency. The mode is not used very often otherwise. There is no symbol or formula for mode, neither population nor sample. Unlike the other two averages, there can be more than one mode or there could be no mode. If you have two modes, it is called bimodal. If there are three modes, then it is called trimodal. If you have more than three modes, then there is no mode. You can also have a data set where no values occur most often, in which case there is no mode.
Example \(\PageIndex{1}\): Finding the Mean, Median, and Mode (Odd Number of Data Values)
The first 11 days of May 2013 in Flagstaff, AZ, had the following high temperatures (in °F)
| 71 | 59 | 69 | 68 | 63 | 57 |
| 57 | 57 | 57 | 65 | 67 |
(Weather Underground, n.d.)
Find the mean, median, and mode for the high temperature
Solution
Since there are only 11 days, then this is a sample.
Mean:
\(\overline{x} = \dfrac{71 + 59 + 69 + 68 + 63 + 57 + 57 + 65 + 67} {11} \)
\(= \dfrac{690}{11}\)
\(\approx 62.7 ^{\circ} F\)
Median:
First put the data in order from smallest to largest.
57, 57, 57, 57, 59, 63, 65, 67, 68, 69, 71
Now work from the outside in, until you get to the middle number.
So the median is 63°F
Mode:
From the ordered list it is easy to see that 57 occurs four times and no other data values occur that often. So the mode is 57°F.
We can now say that the expected high temperature in early May in Flagstaff, Arizona is around 63°F.
Solution
Add text here.
Example \(\PageIndex{2}\): Finding the Mean, Median, and Mode (Even Number of Data Values)
Now let’s look at the first 12 days of May 2013 in Flagstaff, AZ. The following is the high temperatures (in °F)
| 71 | 59 | 69 | 68 | 63 | 57 |
| 57 | 57 | 57 | 65 | 67 | 73 |
(Weather Underground, n.d.)
Find the mean, median, and mode for the high temperature.
Solution
Since there are only 12 days, then this is a sample.
Mean:
\(\overline{x} = \dfrac{71 + 59 + 69 + 68 + 63 + 57 + 57 + 57 + 57 + 65 + 67 + 73} {12} \)
\(= \dfrac{763}{12}\)
\(\approx 63.6 ^{\circ} F\)
Median:
First put the data in order from smallest to largest.
57, 57, 57, 57, 59, 63, 65, 67, 68, 69, 71, 73
Now work from the outside in, until you get to the middle number.
This time there are two numbers that are in the middle. So the median is
\(median = \dfrac{63+65}{2} = 64 ^{\circ}F\).
Mode:
From the ordered list it is easy to see that 57 occurs 4 times and no other data values occurs that often. So the mode is 57°F.
Example \(\PageIndex{3}\): Effect of Extreme Values on the Mean and Median
A random sample of unemployment rates for 10 countries in the European Union (EU) for March 2013 is given:
| 11.0 | 7.2 | 13.1 | 26.7 | 5.7 | 9.9 | 11.5 | 8.1 | 4.7 | 14.5 |
(Eurostat, n.d.)
Find the mean, median, and mode.
Solution
Since the problem says that it is a random sample, we know this is a sample. Also, there are more than 10 countries in the EU.
Mean:
\(\overline{x} = \dfrac{11.0 + 7.2 + 13.1 + 26.7 + \cdots + 14.5} {10} \)
\(= \dfrac{112.4}{10}\)
\(= 11.24\)
The mean is 11.24%.
Median:
4.7, 5.7, 7.2, 8.1, 9.9, 11.0, 11.5, 13.1, 14.5, 26.7
Both 9.9 and 11.0 are the middle numbers, so the median is
\(median = \dfrac{9.9+11.0}{2} = 10.45\).
The median is 10.45%.
Note: This data set has no mode since there is no number that occurs most often.
Now suppose that you remove the 26.7 from your sample since it is such a large number (an outlier). Find the mean, median, and mode.
| 11.0 | 7.2 | 13.1 | 5.7 | 9.9 | 11.5 | 8.1 | 4.7 | 14.5 |
\(\overline{x} = \dfrac{11.0 + 7.2 + 13.1 + 5.7 + \cdots + 14.5} {9} \)
\(= \dfrac{85.7}{9}\)
\(= 9.52\)
The mean is 9.52%
The median is 9.9%.
There is still no mode.
Notice that the mean and median with the 26.7 were a bit different from each other. When the 26.7 value was removed, the mean dropped significantly, while the median dropped, but not as much. This is because the mean is affected by extreme values called outliers, but the median is not affected by outliers as much.
In section 1.5, there was a discussion on histogram shapes. If you look back at Graphs 1.5.11, 1.5.12, and 1.5.13, you will see examples of symmetric, skewed right, and skewed left graphs. Since symmetric graphs have their extremes equally on both sides, then the mean would not be pulled in any direction, so the mean and the median are essentially the same value. With a skewed right graph, there are extreme values on the right, and they will pull the mean up, but not affect the median much. So the mean will be higher than the median in skewed right graphs. Skewed left graphs have their extremes on the left, so the mean will be lower than the median in skewed left graphs.
Example \(\PageIndex{4}\): Finding the Average of a Qualitative Variable
Suppose a class was asked what their favorite soft drink is and the following is the results:
| Coke | Pepsi | Mt. Dew | Coke | Pepsi | Dr. Pepper | Sprite | Coke | Mt. Dew |
| Pepsi | Pepsi | Dr. Pepper | Coke | Sprite | Mt. Dew | Pepsi | Dr. Pepper | Coke |
| Pepsi | Mt. Dew | Coke | Pepsi | Pepsi | Dr. Pepper | Sprite | Pepsi | Coke |
| Dr. Pepper | Mt. Dew | Sprite | Coke | Coke | Pepsi |
Find the average.
Remember, mean, median, and mode are all examples of averages. However since the data is qualitative, you cannot find the mean and the median. The only average you can find is the mode. Notice, Coke was preferred by 9 people, Pepsi was preferred by 10 people, Mt Dew was preferred by 5 people, Dr. Pepper was preferred by 5 people, and Sprite was preferred by 4 people. So Pepsi has the highest frequency, so Pepsi is the mode. If one more person came in the room and said that they preferred Coke, then Pepsi and Coke would both have a frequency of 10. So both Pepsi and Coke would be the modes, and we would call this bimodal.