1.1: Statistical Basics
Data are all around us. Researchers collect data on the effectiveness of a medication for lowering cholesterol. Pollsters report on the percentage of Americans who support gun control. Economists report on the average salary of college graduates. There are many other areas where data are collected. In order to be able to understand data and how to summarize it, we need to understand statistics.
Suppose you want to know the average net worth of a current U.S. Senator. There are 100 Senators, so it is not that hard to collect all 100 values, and then summarize the data. If instead you want to find the average net worth of all current Senators and Representatives in the U.S. Congress, there are only 435 members of Congress. So even though it will be a little more work, it is not that difficult to find the average net worth of all members. Now suppose you want to find the average net worth of everyone in the United States. This would be very difficult, if not impossible. It would take a great deal of time and money to collect the information in a timely manner before all of the values have changed. So instead of getting the net worth of every American, we have to figure out an easier way to find this information. The net worth is what you want to measure, and is called a variable. The net worth of every American is called the population. What we need to do is collect a smaller part of the population, called a sample. In order to see how this works, let’s formalize the definitions.
Variable: Any characteristic that is measured from an object or individual.
Population:
A set of measurements or observations from all objects under study
Sample:
A set of measurements or observations from some objects under study (a subset of a population)
Example \(\PageIndex{1}\): Stating Populations and Samples
Determine the population and sample for each situation.
- A researcher wants to determine the length of the lifecycle of a bark beetle. In order to do this, he breeds 1000 bark beetles and measures the length of time from birth to death for each bark beetle.
Population: The set of lengths of lifecycle of all bark beetles
Sample: The set of lengths of lifecycle of 1000 bark beetles
- The National Rifle Association wants to know what percent of Americans support the right to bear arms. They ask 2500 Americans whether they support the right to bear arms.
Population: The set of responses from all Americans to the question, “Do you support the right to bear arms?”
Sample: The set of responses from 2500 Americans to the question, “Do you support the right to bear arms?”
- The Pew Research Center asked 1000 mothers in the U.S. what their highest attained education level was.
Population: The set of highest education levels of all mothers in the U.S.
Sample: The set of highest education level of 1000 mothers in the U.S.
It is very important that you understand what you are trying to measure before you actually measure it. Also, please note that the population is a set of measurements or observations, and not a set of people. If you say the population is all Americans, then you have only given part of the story. More important is what you are measuring from all Americans. The question is, do you want to measure their race, their eye color, their income, their education level, the number of children they have, or other variables? Therefore, it is very important to state what you measured or observed, and from whom or what the measurements or observations were taken. Once you know what you want to measure or observe, and the source from which you want to take measurements or observations, you need to collect the data.
A data set is a collection of values called data points or data values. N represents the number of data points in a population, while n represents the number of data points in a sample. A data value that is much higher or lower than all of the other data values is called an outlier . Sometimes outliers are just unusual data values that are very interesting and should be studied further, and sometimes they are mistakes. You will need to figure out which is which.
In order to collect the data, we have to understand the types of variables we can collect. There are actually two different types of variables. One is called qualitative and the other is called quantitative.
Qualitative (Categorical) Variable: A variable that represents a characteristic. Qualitative variables are not inherently numbers, and so they cannot be added, multiplied, or averaged, but they can be represented graphically with graphs such as a bar graph.
Examples: gender, hair color, race, nationality, religion, course grade, year in college, etc.
Quantitative (Numerical) Variable: A variable that represents a measurable quantity. Quantitative variables are inherently numbers, and so can they be added, multiplied, averaged, and displayed graphically.
Examples: Height, weight, number of cats owned, score of a football game, etc.
Quantitative variables can be further subdivided into other categories – continuous and discrete.
Continuous Variable: A variable that can take on an uncountable number of values in a range. In other words, the variable can be any number in a range of values. Continuous variables are usually things that are measured.
Examples: Height, weight, foot size, time to take a test, length, etc.
Discrete Variable: A variable that can take on only specific values in a range. Discrete variables are usually things that you count.
Examples: IQ, shoe size, family size, number of cats owned, score in a football game, etc.
Example \(\PageIndex{2}\): Determining Variable Types
Determine whether each variable is quantitative or qualitative. If it is quantitative, then also determine if it is continuous or discrete.
- Length of race
Quantitative and continuous, since this variable is a number and can take on any value in an interval.
- Opinion of a person about the President
Qualitative, since this variable is not a number.
- House color in a neighborhood
Qualitative, since this variable is not a number.
- Number of houses that are in foreclosure in a state
Quantitative and discrete, since this variable is a number but can only be certain values in an interval.
- Weight of a baby at birth
Quantitative and continuous, since this variable is a number and can take on any value in an interval.
- Highest education level of a mother
Qualitative, since the variable is not a number.