Measures of Spread
Measures of Spread
How far are points from one another.
Measures of Spread are used to provide us with an idea of how spread out our data are from one another. Common measures of spread includes:
- Range
- Inter-Quartile Range (IQR)
- Standard Deviation
- Variance
Histogram
The most common visual for quantitative data.
It is easy as to understand the spread of our data visually. The most common visual for quantitative data is known as histogram. Histograms are super useful to understanding the different aspects of quantitative data.
- Center
- Spread
- Shape
- Outliers
In order to understand how histograms are constructed, consider we have the following dataset.
1, 2, 2, 4, 5, 7, 8, 9, 12, 15
First, we need to bin our data. You as the histogram creator ultimately choose how the binning occurs. As an Example, we can chose our bins as 1-4
, 5-8
, 9-12
, and 13-16
. Because the first four 1, 2, 2, 4
values are between 1 and 4, they ho into the first bin. The next three values 5, 7, 8
are between 5 and 8, so they fall in the next bin, then the following two values 9, 12
fall in the third bin, and 15
falls into our last bin.
View the histogram graph through this link. To view the bins we have chosen, please click Options (gear) icon, click Set Classes Manually, then set Start value to 1 and Width to 4, then hit Enter.
The number of values in each bin determine the height of each histogram bar. Changing the bins will result in slightly different visual. There really isn't a right answer to choosing our bins, and in most cases software will choose the appropriate bins for us. But it is something to be aware of.
Five Number Summary
Gives values for calculating the range and interquartile range.
One of the most common ways to measure the spread of our data is by looking at the Five Number Summary, which gives us values for calculating the range and interquartile range.
The Five Number Summary consists of five values:
- Minimum
- First Quartile
- Second Quartile (Median)
- Third Quartile
- Maximum
Consider we have the following dataset.
5, 8, 3, 2, 1, 3, 10
The first thing we need to do to calculate the Five Number Summary is to order our values
1, 2, 3, 3, 5, 8, 10
Once ordered, the minimum 1
and the maximum 10
values are easy to identify as the smallest and the largest values. The median is the middle value 3
in our dataset, we also call it Q2 or the second quartile because 50% of our data or two quarters fall bellow this value.
The remaining two values to complete the Five Number Summary are Q1 2
and Q3 8
. These values can be thought of as the medians of the data on either side of Q2. Notice, Q2
wasn't an either a set of these points used to calculate Q1 or Q3. This provides our Five Number Summary as the following.
- Minimum = 1
- Q1 = 2
- Q2 (Median) = 3
- Q3 = 8
- Maximum = 10
Example of Even Dataset
Lets consider another example for in our dataset has an even set of values.
5, 8, 3, 2, 1, 3, 10, 105
Again, we first need to order the values.
1, 2, 3, 3, 5, 8, 10, 105
We can quickly identify the minimum 1
and the maximum 105
. Remember, with an even number of values, the median or Q2 is given as the mean of the middle two values 3, 5
which gives us Q2 = 4
.
In order to find Q1 and Q3, we divide our dataset between the two values we use to find the median. This provides these two datasets 1, 2, 3, 3
and 5, 8, 10, 105
. Finding the median of each of these will provide Q1 and Q3.
For 1, 2, 3, 3
, Q1 will be the mean of 2, 3
which is 2.5
. And For 5, 8, 10, 105
, Q3 will be the mean of 8, 10
which is 9
. This provides our Five Number Summary as the following.
- Minimum = 1
- Q1 = 2.5
- Q2 (Median) = 4
- Q3 = 9
- Maximum = 105
The Five Number Summary in short
- Minimum: The smallest number in the dataset.
- Q1: The value such that 25% of the data fall below.
- Q2: The value such that 50% of the data fall below.
- Q3: The value such that 75% of the data fall below.
- Maximum: The largest value in the dataset.
Range and Inter-Quartile Range
Once we calculated all the values for the Five Number Summary, finding the range and interquartile range is no problem.
For the first dataset,
1, 2, 3, 3, 5, 8, 10
The range is calculated as the maximum minus the minimum. And Interquartile range is calculated as Q3 minus Q1.
- Range = Maximum - Minimum = 10 - 1 = 9
- InterQuartile Range = Q3 - Q1 = 8 - 2 = 6
Box Plot
Useful for quickly comparing the spread of two datasets.
Box Plot can be useful for quickly comparing the spread of two datasets across some key metrics, like our quartiles, and the maximum and minimum.
From the previously provided link. Click the drop down menu on the left upper corner and select Boxplot to view its shape.
In the box plot the distance between the two outer tips represents the range, and the distance between the two inner tips represents the interquartile range, while the line in between the two inner tips represents the median.