Outliers

10 min

Narrative

The purpose of this Warm-up is to elicit the idea that outliers are often present in data, which will be useful when students investigate the source of outliers and what to do with them in a later activity. 

As students work, monitor for students who

  • Estimate the IQR from values in the box plot.
  • Use a measurement tool to determine the IQR from the box plot.

Students are given the formulas for outliers: A value is considered an outlier for a data set if it is greater than Q3 + 1.5 \boldcdot IQR or less than Q1 - 1.5 \boldcdot IQR. To find extreme values, we are comparing very large or small values to the bulk of the data. This means using the quartiles and interquartile range to compare the value to typical distances to the center of the data.

Launch

Display the histogram and box plot for all to see. Tell students to think of one thing they notice and one thing they wonder about the images. Give students 1 minute of quiet think time, and then 1 minute to discuss with a partner the things they notice. Listen for students who notice that there is a value that seems greatly different from the rest of the data. Select a few students to share things they notice and wonder, making sure to select identified students who notice an extreme value.

Student Task

The histogram and box plot show the average amount of money, in thousands of dollars, spent on each person in the country (per capita spending) for health care in 34 countries.

<p>Histogram from 1 to 10 by 1’s. Per capita health spending by country (thousands of dollars). Beginning at 1 up to but not including 2, height of bar at each interval is 7, 8, 3, 8, 6, 1, 0, 0, 1.</p>

 

<p>Box plot</p>
Box plot from 1 to 10 by 1’s. Per capita health spending by country in thousands of dollars. Approximate box plot values as follows: Whisker from 1 to 2 point 1. Box from 2 point 1 to 4 point 8 with vertical line at 3 point 5. Whisker from 4 point 8 to 9 point 9.

  1. One value in the set is an outlier. Which one is it? What is its approximate value?
  2. By one rule for deciding, a value is an outlier if it is more than 1.5 times the IQR greater than Q3. Show on the box plot whether or not your value meets this definition of outlier.

Sample Response

Sample response:

  1. The value between 9 and 10 thousand dollars is the outlier.
  2. <p>Box plot. Per capita health spending.</p>
Activity Synthesis (Teacher Notes)

Select previously identified students in the order listed in the lesson narrative to share their method for creating this visualization of outliers in the box plot.

Tell students:

  • Values in a data set that are greatly different from the rest of the data are called outliers. The precise meaning of greatly different will be different for different situations. For example, a possible $4,000 difference in this graph does seem like a lot, but if the data represented the entire budgets of these countries in the billions or trillions of dollars (rather than spending on each member of the population for healthcare), it would not be a great difference.
  • Using the IQR to determine outliers helps to adjust the difference to the variability of the bulk of the middle data. Using 1.5 times the IQR allows for some variability on the ends of the distribution to be considered usual.
  • It is also possible for there to be values that are unusually low compared to the rest of the data set. Consider this box plot that displays Q11.5IQR\text{Q1} - 1.5 \boldcdot \text{IQR}. The minimum value for this data set should be considered an outlier.
    <p>Box plot</p>
    Box plot from 1 to 25 by 1’s. Whisker from 1 to 9. Box from 9 to 13 with vertical line at 10. Whisker from 13 to 24. Above the box plot, 2 horizontal segments from 3 to 9 and from 13 to 19, each labeled 1.5 dot IQR.
  • For the purposes of this unit, a value will be considered an outlier for a data set if it is greater than Q3 + 1.5 \boldcdot IQR or less than Q1 - 1.5 \boldcdot IQR. These formulas compare extreme values to the middle half of the data to determine if the value should be considered an outlier.

Add "outlier" to the classroom display created in earlier lessons. The blackline master provides an example of what this display may look like after all items are added.

Standards
Addressing
  • HSS-ID.A.1·Represent data with plots on the real number line (dot plots, histograms, and box plots).
  • S-ID.1·Represent data with plots on the real number line (dot plots, histograms, and box plots).
  • S-ID.1·Represent data with plots on the real number line (dot plots, histograms, and box plots).
  • S-ID.1·Represent data with plots on the real number line (dot plots, histograms, and box plots).
Building Toward
  • HSS-ID.A.3·Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).
  • S-ID.3·Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).
  • S-ID.3·Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).
  • S-ID.3·Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

15 min

10 min