Unit 1 Plan - Algebra 1

Title	Takeaways	Student Summary	Mastery Check	Regents
Lesson 2 Data Representations HSS-ID.A.1 Represent data with plots on the real number line (dot plots, histograms, and box plots).	—	The table shows a list of the number of minutes people could intensely focus on a task before needing a break. Fifty people of different ages are represented. 19 7 1 16 20 2 7 19 9 13 3 9 18 13 20 8 3 14 13 2 8 5 17 7 18 17 8 8 7 6 2 20 7 7 10 7 6 19 3 18 8 19 7 13 20 14 6 3 19 4 In a situation like this, it is helpful to represent the data graphically to better notice any patterns or other interesting features in the data. A dot plot can be used to see the shape and distribution of the data. There were quite a few people that lost focus at around 3, 7, 13, and 19 minutes, and nobody lost focus at 11, 12, or 15 minutes. Dot plots are useful when the data set is not too large and shows all of the individual values in the data set. In this example, a dot plot can easily show all of the data. If the data set is very large (more than 100 values, for example), or if there are many different values that are not exactly the same, it may be hard to see all of the dots on a dot plot. A histogram is another representation that shows the shape and distribution of the same data. Most people lost focus between 5 and 10 minutes or between 15 and 20 minutes, while only 4 of the 50 people got distracted between 20 and 25 minutes. When creating histograms, each interval includes the number at the lower end of the interval but not the number at the upper end. For example, the tallest bar displays values that are greater than or equal to 5 minutes but less than 10 minutes. In a histogram, values that are in an interval are grouped together. Although the individual values get lost with the grouping, a histogram can still show the shape of the distribution. Here is a box plot that represents the same data. Box plots are created using a five-number summary. For a set of data, the five-number summary consists of these five statistics: the minimum value, the first quartile, the median, the third quartile, and the maximum value. These values split the data into four sections, each representing approximately one-fourth of the data. The median of this data is indicated at 8 minutes, and about 25% of the data fall in the short second quarter of the data between 6 and 8 minutes. Similarly, approximately one-fourth of the data are between 8 and 17 minutes. Like the histogram, the box plot does not show individual data values, but other features such as quartiles, range, and median are seen more easily. Dot plots, histograms, and box plots provide three different ways to look at the shape and distribution while highlighting different aspects of the data.	Reasoning about Representations (1 problem) The dot plot, histogram, and box plot represent the distribution of the same data in 3 different ways. What information can be seen most easily in the dot plot? What information can be seen most easily in the histogram? What information can be seen most easily in the box plot? Show Solution Sample response: The actual values, the shape of the distribution, and the most common value are easily seen in the dot plot. The shape of the distribution and the most common interval of data are easily seen in the histogram. The five-number summary (minimum, first quartile, median, third quartile, and maximum) are easily seen in the box plot.	august 2024 #9(2pt) january 2025 #16(2pt) january 2026 #28(2pt)
Section A Check Section A Checkpoint

Lesson 5 Calculating Measures of Center and Variability	—	The mean absolute deviation, or MAD, and the interquartile range, or IQR, are measures of variability. Measures of variability tell you how much the values in a data set tend to differ from one another. A greater measure of variability means that the data are more spread out, while a smaller measure of variability means that the data are more consistent and are closer to the measure of center. To calculate the MAD of a data set: Find the mean of the values in the data set. Find the distance between each data value and the mean (on the number line): \|data value – mean\| Find the mean of the distances. This value is the MAD. To calculate the IQR, subtract the value of the first quartile from the value of the third quartile. Recall that the first and third quartile are included in the five-number summary.	Calculating MAD and IQR (1 problem) 5 18 6 18 13 mean: 12 Find the mean absolute deviation for the data. Find the interquartile range for the data. Show Solution 5.2 12.5	—
Section B Check Section B Checkpoint

Lesson 10 The Effect of Extremes HSS-ID.A.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets. HSS-ID.A.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). HSS-ID.A.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).	—	Is it better to use the mean or median to describe the center of a data set? The mean gives equal importance to each value when finding the center. The mean usually represents the typical values well when the data have a symmetric distribution. On the other hand, the mean can be greatly affected by changes to even a single value. The median tells you the middle value in the data set, so changes to a single value usually do not affect the median much. So, the median is more appropriate for data that are not very symmetrically distributed. We can look at the distribution of a data set and draw conclusions about the mean and the median. Here is a dot plot showing the amount of time a dart takes to hit a target in seconds. The data produce a symmetric distribution. When a distribution is symmetric, the median and mean are both found in the middle of the distribution. Since the median is the middle value (or the mean of the two middle values) of a data set, you can use the symmetry around the center of a symmetric distribution to find it easily. For the mean, you need to know that the sum of the distances away from the mean of the values greater than the mean is equal to the sum of the distances away from the mean of the values less than the mean. Using the symmetry of the symmetric distribution you can see that there are four values 0.1 second above the mean, two values 0.2 seconds above the mean, one value 0.3 seconds above the mean, and one value 0.4 seconds above the mean. Likewise, you can see that there are the same number of values the same distances below the mean. Here is a dot plot using the same data, but with two of the values changed, resulting in a skewed distribution. When you have a skewed distribution, the distribution is not symmetric, so you are not able to use the symmetry to find the median and the mean. The median is still 1.4 seconds since it is still the middle value. The mean, on the other hand, is now about 1.273 seconds. The mean is less than the median because the lower values (0.3 and 0.4) result in a smaller value for the mean. The median is usually more resistant to extreme values than is the mean. For this reason, the median is the preferred measure of center when a distribution is skewed or if there are extreme values. When using the median, you would also use the IQR as the preferred measure of variability. In a more symmetric distribution, the mean is the preferred measure of center, and the MAD is the preferred measure of variability.	Shape and Statistics (1 problem) Is the mean greater than, less than, or equal to the median? Explain your reasoning. Is the mean greater than, less than, or equal to the median? Explain your reasoning. Show Solution Sample response: The mean is greater than the median because the larger values to the right make the mean higher than it would be if the distribution were uniform. Sample response. The mean is equal to the median because the data is symmetric.	august 2024 #9(2pt) june 2024 #13(2pt) june 2024 #16(2pt) june 2025 #4(2pt) june 2025 #12(2pt) august 2025 #14(2pt) january 2025 #16(2pt) january 2025 #20(2pt) january 2026 #28(2pt)
Lesson 11 Comparing and Contrasting Data Distributions HSS-ID.A.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.	—	The mean absolute deviation, or MAD, is a measure of variability that is calculated by finding the mean distance from the mean of all the data points. Here are two dot plots, each with a mean of 15 centimeters, displaying the length of sea scallop shells in centimeters. Notice that both dot plots show a symmetric distribution so the mean and the MAD are appropriate choices for describing center and variability. The data in the first dot plot appear to be more spread apart than the data in the second dot plot, so you can say that the first data set appears to have greater variability than does the second data set. This is confirmed by the MAD. The MAD of the first data set is 1.18 centimeters and the MAD of the second data set is approximately 0.94 cm. This means that the values in the first data set are, on average, about 1.18 cm away from the mean, and the values in the second data set are, on average, about 0.94 cm away from the mean. The greater the MAD of the data, the greater the variability of the data. The interquartile range, IQR, is a measure of variability that is calculated by subtracting the value for the first quartile, Q1, from the value for the third quartile, Q3. These two box plots represent the distributions of the lengths in centimeters of a different group of sea scallop shells, each with a median of 15 centimeters. Notice that neither of the box plots have a symmetric distribution. The median and the IQR are appropriate choices for describing center and variability for these data sets. The middle half of the data displayed in the first box plot appear to be more spread apart, or show greater variability, than the middle half of the data displayed in the second box plot. The IQR of the first distribution is 14 cm, and the IQR is 10 cm for the second data set. The IQR measures the difference between the median of the second half of the data, Q3, and the median of the first half, Q1, of the data, so it is not affected by the minimum or the maximum value in the data set. It is a measure of the spread of the middle 50% of the data. The MAD is calculated using every value in the data set, and the IQR is calculated using only the values for Q1 and Q3.	Which Menu? (1 problem) A restaurant owner believes that it is beneficial to have different menu items with a lot of variability so that people can have a choice of expensive and inexpensive food. Several chefs offer menus and suggested prices for the food they create. The owner creates dot plots for the prices of the menu items and finds some summary statistics. Which menu best matches what the restaurant is looking for? Explain your reasoning. Italian: mean: $9.03 median: $9 MAD: $2.45 IQR: $3.50 Diner: mean: $3.36 median: $2 MAD: $2.12 IQR: $4 Japanese: mean: $10.35 median: $10 MAD: $5.55 IQR: $9.50 Steakhouse: mean: $11.51 median: $10.50 MAD: $3.69 IQR: $4.50 Show Solution Japanese. The variability, whether measured with IQR or MAD, is greater than any of the other menus available.	june 2024 #13(2pt) june 2024 #16(2pt) june 2025 #4(2pt) june 2025 #12(2pt) august 2025 #14(2pt) january 2025 #20(2pt)
Lesson 12 Standard Deviation HSS-ID.A.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets. HSS-ID.A.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).	—	We can describe the variability of a distribution using the standard deviation. The standard deviation is a measure of variability that is calculated using a method that is similar to the one used to calculate the MAD, or mean absolute deviation. A deeper understanding of the importance of standard deviation as a measure of variability will come with a deeper study of statistics. For now, know that the standard deviation is mathematically important and will be used as the appropriate measure of variability when the mean is an appropriate measure of center. Like the MAD, the standard deviation is large when the data set is more spread out, and the standard deviation is small when the variability is small. The intuition you gained about MAD will also work for the standard deviation.	True or False: Reasoning with Standard Deviation (1 problem) The low temperature in degrees Celsius for some cities on the same days in March are recorded in the dot plots. Dot plot from 5 to 15 by 1's. Christchurch, New Zealand low temperature in degrees Celsius. Beginning at 5, number of dots above each increment is 0, 1, 1, 4, 2, 6, 2, 4, 1, 1, 0. Dot plot from negative 5 to 9 by 1's. Saint Louis Missouri low temperature in degrees Celsius. Beginning at negative 5, number of dots above each increment is 0, 0, 0, 0, 1, 3, 1, 1, 5, 0, 1, 5, 1, 1, 0. Dot plot from negative 5 to 9 by 1's. Chicago, Illinois low temperature in degrees Celsius. Beginning at negative 5, number of dots above each increment is 0, 1, 3, 1, 1, 5, 0, 1, 5, 1, 1, 0, 0, 0, 0. Dot plot from negative 5 to 9 by 1's. London, United Kingdom low temperature in degrees Celsius. Beginning at negative 5, number of dots above each increment is 0, 1, 3, 1, 1, 5, 0, 1, 5, 1, 0, 0, 0, 1, 0. Decide if each statement is true or false. Explain your reasoning. The standard deviation of Christchurch’s temperatures is zero because the data distribution is symmetric. The standard deviation of St. Louis’s temperatures is equal to the standard deviation of Chicago’s temperatures. The standard deviation of Chicago’s temperatures is less than the standard deviation of London’s temperatures. Show Solution False. Sample response: The standard deviation is a measure of variability and there is some variability in the data set. True. Sample response: Chicago’s distribution of temperatures is the same as St. Louis’s, but 3 degrees cooler. The two cities have the same variability in temperature, and so they have the same standard deviation. True. Sample response: London has the same low temperatures as does Chicago except on the hottest day, London’s temperature is 3 degrees warmer than Chicago’s. Therefore, the temperatures in London have more variability than Chicago’s temperatures on these days.	june 2024 #13(2pt) june 2024 #16(2pt) june 2025 #4(2pt) june 2025 #12(2pt) august 2025 #14(2pt) january 2025 #20(2pt)
Lesson 14 Outliers HSS-ID.A.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). HSS-ID.A.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets. HSS-ID.A.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).	—	In statistics, an outlier is a data value that is unusual in that it differs quite a bit from the other values in the data set. Outliers occur in data sets for a variety of reasons including, but not limited to: Errors in the data that result from the data collection or data entry process. Results in the data that represent unusual values that occur in the population. Outliers can reveal cases worth studying in detail or errors in the data collection process. In general, they should be included in any analysis done with the data. A value is an outlier if it is More than 1.5 times the interquartile range greater than Q3 (if $x > \text{Q3 } + 1.5 \boldcdot \text{ IQR}$ ). More than 1.5 times the interquartile range less than Q1 (if $x < \text{Q1 } - 1.5 \boldcdot \text{ IQR}$ ). In this box plot, the minimum and maximum are at least two outliers. Box plot from 1 to 25 by 1’s. Whisker from 1 to 9. Box from 9 to 13 with vertical line at 10. Whisker from 13 to 24. Above the box plot, 2 horizontal segments from 3 to 9 and from 13 to 19, each labeled 1.5 dot IQR. It is important to identify the source of outliers because outliers can affect measures of center and variability in significant ways. The box plot displays the resting heart rate, in beats per minute (bpm), of 50 athletes taken five minutes after a workout. Some summary statistics include: mean: 69.78 bpm standard deviation: 10.71 bpm minimum: 55 bpm Q1: 62 bpm median: 70 bpm Q3: 76 bpm maximum: 112 bpm It appears that the maximum value of 112 bpm may be an outlier. Beacuse the interquartile range is 14 bpm ( $76 - 62 = 14$ ) and $\text{Q3 }+ 1.5 \boldcdot \text{ IQR } = 97$ , we should label the maximum value as an outlier. Searching through the actual data set, it could be confirmed that this is the only outlier. After reviewing the data collection process, it is discovered that the athlete with the heart rate measurement of 112 bpm was taken one minute after a workout instead of five minutes after. The outlier should be deleted from the data set because it was not obtained under the right conditions. Once the outlier is removed, the box plot and summary statistics are: mean: 68.92 bpm standard deviation: 8.9 bpm minimum: 55 bpm Q1: 61 bpm median: 70 bpm Q3: 75.5 bpm maximum: 85 bpm The mean decreased by 0.86 bpm and the median remained the same. The standard deviation decreased by 1.81 bpm which is about 17% of its previous value. Based on the standard deviation, the data set with the outlier removed shows much less variability than the original data set containing the outlier. Because the mean and standard deviation use all of the numerical values, removing one very large data point can affect these statistics in important ways. The median remained the same after the removal of the outlier and the IQR increased slightly. These measures of center and variability are much more resistant to change than the mean and standard deviation are. The median and IQR measure the middle of the data based on the number of values rather than the actual numerical values themselves, so the loss of a single value will not often have a great effect on these statistics. The source of any possible errors should always be investigated. If the measurement of 112 beats per minute was found to be taken under the right conditions and merely included an athlete whose heart rate did not slow as much as the other athletes' heart rate, it should not be deleted so that the data reflect the actual measurements. If the situation cannot be revisited to determine the source of the outlier, it should not be removed. To avoid tampering with the data and to report accurate results, data values should not be deleted unless they can be confirmed to be an error in the data collection or data entry process.	Expecting Outliers (1 problem) A group of 20 students are asked to report the number of pets they keep in their house. The results are: 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 4, 21 mean: 2.4 pets standard deviation: 4.47 pets Q1: 0.5 pets median: 1 pet Q3: 2.5 pets Would any of these values be considered outliers? Explain your reasoning. After being told that they should not count any fish in the report, the value of 3 becomes a 2 and the value of 21 becomes 1. Would these changes affect the median, mean, standard deviation, or interquartile range? If so, would each measure decrease or increase from their original values? Show Solution Yes, 21 pets is an outlier since it is greater than $5.5 = 2.5 + 1.5 \boldcdot 2$ . The mean and standard deviation would decrease with the changes. The median would stay the same and the IQR would decrease slightly.	august 2024 #9(2pt) june 2024 #13(2pt) june 2024 #16(2pt) june 2025 #4(2pt) june 2025 #12(2pt) august 2025 #14(2pt) january 2025 #16(2pt) january 2025 #20(2pt) january 2026 #28(2pt)
Lesson 15 Comparing Data Sets HSS-ID.A.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). HSS-ID.A.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.	—	To compare data sets, it is helpful to look at the measures of center and measures of variability. The shape of the distribution can help choose the most useful measure of center and measure of variability. When distributions are symmetric or approximately symmetric, the mean is the preferred measure of center and should be paired with the standard deviation as the preferred measure of variability. When distributions are skewed or when outliers are present, the median is usually a better measure of center and should be paired with the interquartile range (IQR) as the preferred measure of variability. Once the appropriate measure of center and measure of variability are selected, these measures can be compared for data sets with similar shapes. For example, let’s compare the number of seconds it takes football players to complete a 40-yard dash at two different positions. First, we can look at a dot plot of the data to see that the tight-end times do not seem distributed symmetrically, so we should probably find the median and IQR for both sets of data to compare information. The median and IQR could be computed from the values, but can also be determined from a box plot. Box plot from 4 point 25 to 5 point 75 by point 25’s. Wide receiver time in seconds. Whisker from 4 point 31 to 4 point 405. Box from 4 point 405 to 4 point 665 with vertical line at 4 point 5. Whisker from 4 point 665 to 4 point 8. Box plot from 4 point 25 to 5 point 75 by point 25’s. Tight end times in seconds. Whisker from 4 point 56 to 4 point 685. Box from 4 point 685 to 5 point 225 with vertical line at 4 point 87. Whisker from 5 point 255 to 5 point 7. This shows that the tight-end times have a greater median (about 4.9 seconds) compared to the median of wide-receiver times (about 4.5 seconds). The IQR is also greater for the tight-end times (about 0.5 seconds) compared to the IQR for the wide-receiver times (about 0.25 seconds). This means that the tight ends tend to be slower in the 40-yard dash when compared to the wide receivers. The tight ends also have greater variability in their times. Together, this can be taken to mean that, in general, a typical wide receiver is faster than a typical tight end is, and the wide receivers tend to have more similar times to one another than the tight ends do to one another.	Comparing Mascots (1 problem) A new pet food company wants to sell their product online and use social media to promote themselves. To determine whether to use a dog or a cat as their mascot, they research the number of clicks on links with an image of a dog or a cat. mean: 1,263.5 clicks median: 1,282 clicks standard deviation: 357.4 clicks IQR: 409 clicks mean: 1,105.4 clicks median: 1,125.5 clicks standard deviation: 239.3 clicks IQR: 312.5 clicks Based on the shape of the distributions, what measure of center and measure of variability would you use to compare the distributions? Explain your reasoning. Based on the data shown here, should the company use a dog or cat mascot? Explain your reasoning. Show Solution Mean and standard deviation. Since the distributions are approximately symmetric, the mean and standard deviation are the best choice to represent the data. Sample responses: The company should use a dog mascot since the mean is greater. The company should use a cat mascot since the standard deviation shows that the images are more consistently clicked over 1,000 times while the dog images sometimes get fewer than 200 clicks.	august 2024 #9(2pt) june 2024 #13(2pt) june 2024 #16(2pt) june 2025 #4(2pt) june 2025 #12(2pt) august 2025 #14(2pt) january 2025 #16(2pt) january 2025 #20(2pt) january 2026 #28(2pt)
Section D Check Section D Checkpoint
Unit 1 Assessment End-of-Unit Assessment

HSS-ID.A.1

Represent data with plots on the real number line (dot plots, histograms, and box plots).

—

The table shows a list of the number of minutes people could intensely focus on a task before needing a break. Fifty people of different ages are represented.

19
7
1
16
20
2
7
19
9
13
3
9
18
13
20
8
3
14
13
2
8
5
17
7
18
17
8
8
7
6
2
20
7
7
10
7
6
19
3
18
8
19
7
13
20
14
6
3
19
4

<p>Dot plot from 0 to 25 by 1’s. Time in minutes. Beginning at 0, number of dots above each increment is 0, 1, 2, 5, 1, 1, 3, 8, 5, 2, 1, 0, 0, 4, 2, 0, 1, 2, 3, 5, 4, 0, 0, 0, 0, 0.</p>

In a situation like this, it is helpful to represent the data graphically to better notice any patterns or other interesting features in the data. A dot plot can be used to see the shape and distribution of the data.

There were quite a few people that lost focus at around 3, 7, 13, and 19 minutes, and nobody lost focus at 11, 12, or 15 minutes. Dot plots are useful when the data set is not too large and shows all of the individual values in the data set. In this example, a dot plot can easily show all of the data. If the data set is very large (more than 100 values, for example), or if there are many different values that are not exactly the same, it may be hard to see all of the dots on a dot plot.

A histogram is another representation that shows the shape and distribution of the same data.

<p>Histogram from 0 to 25 by 5’s. Time in minutes. Beginning at 0 up to but not including 5, height of bar at each interval is 9, 19, 7, 11, 4.</p>

Most people lost focus between 5 and 10 minutes or between 15 and 20 minutes, while only 4 of the 50 people got distracted between 20 and 25 minutes. When creating histograms, each interval includes the number at the lower end of the interval but not the number at the upper end.

For example, the tallest bar displays values that are greater than or equal to 5 minutes but less than 10 minutes. In a histogram, values that are in an interval are grouped together. Although the individual values get lost with the grouping, a histogram can still show the shape of the distribution.

Here is a box plot that represents the same data.

<p>Boxplot from 0 to 25 by 1’s. time in minutes. Whisker from 1 to 6. Box from 6 to 17 with a vertical line at 8. Whisker from 17 to 20.</p>

Box plots are created using a five-number summary. For a set of data, the five-number summary consists of these five statistics: the minimum value, the first quartile, the median, the third quartile, and the maximum value. These values split the data into four sections, each representing approximately one-fourth of the data. The median of this data is indicated at 8 minutes, and about 25% of the data fall in the short second quarter of the data between 6 and 8 minutes. Similarly, approximately one-fourth of the data are between 8 and 17 minutes. Like the histogram, the box plot does not show individual data values, but other features such as quartiles, range, and median are seen more easily. Dot plots, histograms, and box plots provide three different ways to look at the shape and distribution while highlighting different aspects of the data.

Reasoning about Representations (1 problem)

The dot plot, histogram, and box plot represent the distribution of the same data in 3 different ways.

What information can be seen most easily in the dot plot?
What information can be seen most easily in the histogram?
What information can be seen most easily in the box plot?

<p>Dot plot from 1 to 8 by 0.5’s. battery life in hours. Beginning at 1, number of dots above each increment is 0,0,0,2,2,4,2,4,2,6,2,2,0,0,0.</p>

<p>Histogram from 1 to 8 by 1’s. battery life in hours. Beginning at 1 up to but not including 2, height of bar at each interval is 0, 2, 6, 6, 8, 3, 4, 0.</p>

<p>Boxplot from 1 to 8 by 0.5’s. battery life in hours. Whisker from 2.5 to 3.5. Box from 3.5 to 5.5 with a vertical line at 4.5. Whisker from 5.5 to 6.5.</p>

Show Solution

Sample response:

The actual values, the shape of the distribution, and the most common value are easily seen in the dot plot.
The shape of the distribution and the most common interval of data are easily seen in the histogram.
The five-number summary (minimum, first quartile, median, third quartile, and maximum) are easily seen in the box plot.

august 2024 #9(2pt)

january 2025 #16(2pt)

january 2026 #28(2pt)

Section A Check

Section A Checkpoint

Lesson 5

Calculating Measures of Center and Variability

—

The mean absolute deviation, or MAD, and the interquartile range, or IQR, are measures of variability. Measures of variability tell you how much the values in a data set tend to differ from one another. A greater measure of variability means that the data are more spread out, while a smaller measure of variability means that the data are more consistent and are closer to the measure of center.

To calculate the MAD of a data set:

Find the mean of the values in the data set.
Find the distance between each data value and the mean (on the number line):
|data value – mean|
Find the mean of the distances. This value is the MAD.

To calculate the IQR, subtract the value of the first quartile from the value of the third quartile. Recall that the first and third quartile are included in the five-number summary.

Calculating MAD and IQR (1 problem)

5
18
6
18
13

mean: 12

Find the mean absolute deviation for the data.
Find the interquartile range for the data.

Show Solution

5.2
12.5

—

Section B Check

Section B Checkpoint

Lesson 10

The Effect of Extremes

HSS-ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

HSS-ID.A.1

Represent data with plots on the real number line (dot plots, histograms, and box plots).

HSS-ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

—

Is it better to use the mean or median to describe the center of a data set?

The mean gives equal importance to each value when finding the center. The mean usually represents the typical values well when the data have a symmetric distribution. On the other hand, the mean can be greatly affected by changes to even a single value.

The median tells you the middle value in the data set, so changes to a single value usually do not affect the median much. So, the median is more appropriate for data that are not very symmetrically distributed.

We can look at the distribution of a data set and draw conclusions about the mean and the median.

Here is a dot plot showing the amount of time a dart takes to hit a target in seconds. The data produce a symmetric distribution.

<p>Dot plot from 0.9 to 1.9 by 0.1’s. Time to hit dartboard in seconds. Beginning at 0.9, number of dots above each increment is 0, 1, 1, 2, 4, 6, 4, 2, 1, 1, 0.</p>

When a distribution is symmetric, the median and mean are both found in the middle of the distribution. Since the median is the middle value (or the mean of the two middle values) of a data set, you can use the symmetry around the center of a symmetric distribution to find it easily. For the mean, you need to know that the sum of the distances away from the mean of the values greater than the mean is equal to the sum of the distances away from the mean of the values less than the mean. Using the symmetry of the symmetric distribution you can see that there are four values 0.1 second above the mean, two values 0.2 seconds above the mean, one value 0.3 seconds above the mean, and one value 0.4 seconds above the mean. Likewise, you can see that there are the same number of values the same distances below the mean.

Here is a dot plot using the same data, but with two of the values changed, resulting in a skewed distribution.

<p>Dot plot from 0.2 to 1.7 by 0.1’s. Time to hit dartboard in seconds. Beginning at 0.2, number of dots above each increment is 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 2, 4, 6, 4, 2, 0.</p>

When you have a skewed distribution, the distribution is not symmetric, so you are not able to use the symmetry to find the median and the mean. The median is still 1.4 seconds since it is still the middle value. The mean, on the other hand, is now about 1.273 seconds. The mean is less than the median because the lower values (0.3 and 0.4) result in a smaller value for the mean.

The median is usually more resistant to extreme values than is the mean. For this reason, the median is the preferred measure of center when a distribution is skewed or if there are extreme values. When using the median, you would also use the IQR as the preferred measure of variability. In a more symmetric distribution, the mean is the preferred measure of center, and the MAD is the preferred measure of variability.

Shape and Statistics (1 problem)

Is the mean greater than, less than, or equal to the median? Explain your reasoning.
Is the mean greater than, less than, or equal to the median? Explain your reasoning.

Show Solution

Sample response: The mean is greater than the median because the larger values to the right make the mean higher than it would be if the distribution were uniform.
Sample response. The mean is equal to the median because the data is symmetric.

august 2024 #9(2pt)

june 2024 #13(2pt)

june 2024 #16(2pt)

june 2025 #4(2pt)

june 2025 #12(2pt)

august 2025 #14(2pt)

january 2025 #16(2pt)

january 2025 #20(2pt)

january 2026 #28(2pt)

Lesson 11

Comparing and Contrasting Data Distributions

HSS-ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

—

The mean absolute deviation, or MAD, is a measure of variability that is calculated by finding the mean distance from the mean of all the data points. Here are two dot plots, each with a mean of 15 centimeters, displaying the length of sea scallop shells in centimeters.

<p>Dot plot from 11 to 19 by 1’s. Length in centimeters. Beginning at 11, number of dots above each increment is 0, 1, 2, 3, 5, 3, 2, 1, 0</p>

<p>Dot plot from 11 to 19 by 1’s. Length in centimeters. Beginning at 11, number of dots above each increment is 0, 0, 2, 4, 5, 4, 2, 0, 0.</p>

Notice that both dot plots show a symmetric distribution so the mean and the MAD are appropriate choices for describing center and variability. The data in the first dot plot appear to be more spread apart than the data in the second dot plot, so you can say that the first data set appears to have greater variability than does the second data set. This is confirmed by the MAD. The MAD of the first data set is 1.18 centimeters and the MAD of the second data set is approximately 0.94 cm. This means that the values in the first data set are, on average, about 1.18 cm away from the mean, and the values in the second data set are, on average, about 0.94 cm away from the mean. The greater the MAD of the data, the greater the variability of the data.

The interquartile range, IQR, is a measure of variability that is calculated by subtracting the value for the first quartile, Q1, from the value for the third quartile, Q3. These two box plots represent the distributions of the lengths in centimeters of a different group of sea scallop shells, each with a median of 15 centimeters.

<p>Box plot from 2 to 20 by 1’s. Length in centimeters. Whisker from 3 to 5. Box from 5 to 19 with vertical line at 15. Whisker from 19 to 20.</p>

<p>Box plot from 2 to 20 by 1’s. Length in centimeters. Whisker from 3 to 9. Box from 9 to 19 with vertical line at 15. Whisker from 19 to 20.</p>

Notice that neither of the box plots have a symmetric distribution. The median and the IQR are appropriate choices for describing center and variability for these data sets. The middle half of the data displayed in the first box plot appear to be more spread apart, or show greater variability, than the middle half of the data displayed in the second box plot. The IQR of the first distribution is 14 cm, and the IQR is 10 cm for the second data set. The IQR measures the difference between the median of the second half of the data, Q3, and the median of the first half, Q1, of the data, so it is not affected by the minimum or the maximum value in the data set. It is a measure of the spread of the middle 50% of the data.

The MAD is calculated using every value in the data set, and the IQR is calculated using only the values for Q1 and Q3.

Which Menu? (1 problem)

A restaurant owner believes that it is beneficial to have different menu items with a lot of variability so that people can have a choice of expensive and inexpensive food. Several chefs offer menus and suggested prices for the food they create. The owner creates dot plots for the prices of the menu items and finds some summary statistics. Which menu best matches what the restaurant is looking for? Explain your reasoning.

Italian:

mean: $9.03

median: $9

MAD: $2.45

IQR: $3.50

<p>Dot plot from 0 to 34 by 2’s. Price in dollars. Numbers of dots above 2 is 1, 4.50 is 1, 5 is 5, 6 is 2, 7 is 1, 8 is 9, 9 is 4, 10 is 3, 10.50 is 3, 11 is 3, 12.50 is 6, 14.50 is 2.</p>

Diner:

mean: $3.36

median: $2

MAD: $2.12

IQR: $4

<p>Dot plot from 0 to 34 by 2’s. Price in dollars. Numbers of dots above 1 is 12, 2 is 9, 3 is 5, 4 is 2, 5 is 7, 6.50 is 1, 12 is 1, 16 is 1.</p>

Japanese:

mean: $10.35

median: $10

MAD: $5.55

IQR: $9.50

<p>Dot plot from 0 to 34 by 2’s. Price in dollars. Numbers of dots above 2 is 3, 3 is 2, 4 is 5, 5 is 3, 6 is 4, 7 is 1, 9 is 1, 10 is 4, 12 is 4, 13 is 2, 14 is 3, 15 is 1, 17 is 1, 20 is 3, 21 is 1, 25 is 1, 33 is 1.</p>

Steakhouse:

mean: $11.51

median: $10.50

MAD: $3.69

IQR: $4.50

<p>Dot plot from 0 to 34 by 2’s. Price in dollars. Numbers of dots above 5 is 4, 6 is 4, 8 is 1, 9 is 3, 9.50 is 1, 10 is 7, 11 is 4, 12 is 3, 13 is 3, 14 is 1, 16 is 4, 17 is 1, 18 is 1, 22 is 1, 23 is 1, 25 is 1.</p>

Show Solution

Japanese. The variability, whether measured with IQR or MAD, is greater than any of the other menus available.

june 2024 #13(2pt)

june 2024 #16(2pt)

june 2025 #4(2pt)

june 2025 #12(2pt)

august 2025 #14(2pt)

january 2025 #20(2pt)

Lesson 12

Standard Deviation

HSS-ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

HSS-ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

—

We can describe the variability of a distribution using the standard deviation. The standard deviation is a measure of variability that is calculated using a method that is similar to the one used to calculate the MAD, or mean absolute deviation.

A deeper understanding of the importance of standard deviation as a measure of variability will come with a deeper study of statistics. For now, know that the standard deviation is mathematically important and will be used as the appropriate measure of variability when the mean is an appropriate measure of center.

Like the MAD, the standard deviation is large when the data set is more spread out, and the standard deviation is small when the variability is small. The intuition you gained about MAD will also work for the standard deviation.

True or False: Reasoning with Standard Deviation (1 problem)

The low temperature in degrees Celsius for some cities on the same days in March are recorded in the dot plots.

Decide if each statement is true or false. Explain your reasoning.

The standard deviation of Christchurch’s temperatures is zero because the data distribution is symmetric.
The standard deviation of St. Louis’s temperatures is equal to the standard deviation of Chicago’s temperatures.
The standard deviation of Chicago’s temperatures is less than the standard deviation of London’s temperatures.

Show Solution

False. Sample response: The standard deviation is a measure of variability and there is some variability in the data set.
True. Sample response: Chicago’s distribution of temperatures is the same as St. Louis’s, but 3 degrees cooler. The two cities have the same variability in temperature, and so they have the same standard deviation.
True. Sample response: London has the same low temperatures as does Chicago except on the hottest day, London’s temperature is 3 degrees warmer than Chicago’s. Therefore, the temperatures in London have more variability than Chicago’s temperatures on these days.

june 2024 #13(2pt)

june 2024 #16(2pt)

june 2025 #4(2pt)

june 2025 #12(2pt)

august 2025 #14(2pt)

january 2025 #20(2pt)

Lesson 14

Outliers

HSS-ID.A.1

Represent data with plots on the real number line (dot plots, histograms, and box plots).

HSS-ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

HSS-ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

—

In statistics, an outlier is a data value that is unusual in that it differs quite a bit from the other values in the data set.

Outliers occur in data sets for a variety of reasons including, but not limited to:

Errors in the data that result from the data collection or data entry process.
Results in the data that represent unusual values that occur in the population.

Outliers can reveal cases worth studying in detail or errors in the data collection process. In general, they should be included in any analysis done with the data.

A value is an outlier if it is

More than 1.5 times the interquartile range greater than Q3 (if $x > \text{Q3 } + 1.5 \boldcdot \text{ IQR}$ ).
More than 1.5 times the interquartile range less than Q1 (if $x < \text{Q1 } - 1.5 \boldcdot \text{ IQR}$ ).

In this box plot, the minimum and maximum are at least two outliers.

It is important to identify the source of outliers because outliers can affect measures of center and variability in significant ways. The box plot displays the resting heart rate, in beats per minute (bpm), of 50 athletes taken five minutes after a workout.

<p>Box plot from 50 to 120 by 10’s. Heartbeats per minute. Whisker from 55 to 62. Box from 62 to 76 with vertical line at 70. Whisker from 76 to 112. Dotted line, labeled 1.5 times IQR, from 76 to 97.</p>

Some summary statistics include:

mean: 69.78 bpm
standard deviation: 10.71 bpm
minimum: 55 bpm

Q1: 62 bpm
median: 70 bpm
Q3: 76 bpm
maximum: 112 bpm

It appears that the maximum value of 112 bpm may be an outlier. Beacuse the interquartile range is 14 bpm ( $76 - 62 = 14$ ) and $\text{Q3 }+ 1.5 \boldcdot \text{ IQR } = 97$ , we should label the maximum value as an outlier. Searching through the actual data set, it could be confirmed that this is the only outlier.

After reviewing the data collection process, it is discovered that the athlete with the heart rate measurement of 112 bpm was taken one minute after a workout instead of five minutes after. The outlier should be deleted from the data set because it was not obtained under the right conditions.

Once the outlier is removed, the box plot and summary statistics are:

<p>Box plot from 50 to 120 by 10’s. Heartbeats per minute. Whisker from 55 to 61. Box from 61 to 75.5 with vertical line at 70. Whisker from 75.5 to 85.</p>

mean: 68.92 bpm
standard deviation: 8.9 bpm
minimum: 55 bpm
Q1: 61 bpm

median: 70 bpm
Q3: 75.5 bpm
maximum: 85 bpm

The mean decreased by 0.86 bpm and the median remained the same. The standard deviation decreased by 1.81 bpm which is about 17% of its previous value. Based on the standard deviation, the data set with the outlier removed shows much less variability than the original data set containing the outlier. Because the mean and standard deviation use all of the numerical values, removing one very large data point can affect these statistics in important ways.

The median remained the same after the removal of the outlier and the IQR increased slightly. These measures of center and variability are much more resistant to change than the mean and standard deviation are. The median and IQR measure the middle of the data based on the number of values rather than the actual numerical values themselves, so the loss of a single value will not often have a great effect on these statistics.

The source of any possible errors should always be investigated. If the measurement of 112 beats per minute was found to be taken under the right conditions and merely included an athlete whose heart rate did not slow as much as the other athletes' heart rate, it should not be deleted so that the data reflect the actual measurements. If the situation cannot be revisited to determine the source of the outlier, it should not be removed. To avoid tampering with the data and to report accurate results, data values should not be deleted unless they can be confirmed to be an error in the data collection or data entry process.

Expecting Outliers (1 problem)

A group of 20 students are asked to report the number of pets they keep in their house. The results are:

0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 4, 21

mean: 2.4 pets
standard deviation: 4.47 pets
Q1: 0.5 pets
median: 1 pet
Q3: 2.5 pets

Would any of these values be considered outliers? Explain your reasoning.
After being told that they should not count any fish in the report, the value of 3 becomes a 2 and the value of 21 becomes 1. Would these changes affect the median, mean, standard deviation, or interquartile range? If so, would each measure decrease or increase from their original values?

Show Solution

Yes, 21 pets is an outlier since it is greater than $5.5 = 2.5 + 1.5 \boldcdot 2$ .
The mean and standard deviation would decrease with the changes. The median would stay the same and the IQR would decrease slightly.

august 2024 #9(2pt)

june 2024 #13(2pt)

june 2024 #16(2pt)

june 2025 #4(2pt)

june 2025 #12(2pt)

august 2025 #14(2pt)

january 2025 #16(2pt)

january 2025 #20(2pt)

january 2026 #28(2pt)

Lesson 15

Comparing Data Sets

HSS-ID.A.1

Represent data with plots on the real number line (dot plots, histograms, and box plots).

HSS-ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

—

To compare data sets, it is helpful to look at the measures of center and measures of variability. The shape of the distribution can help choose the most useful measure of center and measure of variability.

When distributions are symmetric or approximately symmetric, the mean is the preferred measure of center and should be paired with the standard deviation as the preferred measure of variability. When distributions are skewed or when outliers are present, the median is usually a better measure of center and should be paired with the interquartile range (IQR) as the preferred measure of variability.

Once the appropriate measure of center and measure of variability are selected, these measures can be compared for data sets with similar shapes.

For example, let’s compare the number of seconds it takes football players to complete a 40-yard dash at two different positions. First, we can look at a dot plot of the data to see that the tight-end times do not seem distributed symmetrically, so we should probably find the median and IQR for both sets of data to compare information.

<p>Dot plot from 4 point 25 to 5 point 75 by point 25’s. Wide receiver times in seconds. Beginning at 4 point 25 up to but not including 4 point 5, number of dots in each interval is 12, 11, 2, 0, 0, 0.<br>
</p>

<p>Dot plot from 4 point 25 to 5 point 75 by point 25’s. Tight end times in seconds. Beginning at 4 point 25 up to but not including 4 point 5, number of dots in each interval is 0, 10, 6, 4, 3, 1.<br>
</p>

The median and IQR could be computed from the values, but can also be determined from a box plot.

This shows that the tight-end times have a greater median (about 4.9 seconds) compared to the median of wide-receiver times (about 4.5 seconds). The IQR is also greater for the tight-end times (about 0.5 seconds) compared to the IQR for the wide-receiver times (about 0.25 seconds).

This means that the tight ends tend to be slower in the 40-yard dash when compared to the wide receivers. The tight ends also have greater variability in their times. Together, this can be taken to mean that, in general, a typical wide receiver is faster than a typical tight end is, and the wide receivers tend to have more similar times to one another than the tight ends do to one another.

Comparing Mascots (1 problem)

A new pet food company wants to sell their product online and use social media to promote themselves. To determine whether to use a dog or a cat as their mascot, they research the number of clicks on links with an image of a dog or a cat.

mean: 1,263.5 clicks

median: 1,282 clicks

standard deviation: 357.4 clicks

IQR: 409 clicks

<p>Histogram from 0 to 2,400 by 200’s. clicks for dog images. Beginning at 0, up to but not including 200, height of bar at each interval is 1, 0, 5, 2, 11, 21, 25, 20, 9, 4, 2, 0.</p>

mean: 1,105.4 clicks

median: 1,125.5 clicks

standard deviation: 239.3 clicks

IQR: 312.5 clicks

<p>Histogram from 0 to 2,400 by 200’s. Clicks for cat images. Beginning at 0, up to but not including 200, height of bar at each interval is 0, 0, 3, 6, 23, 32, 28, 6, 1, 0, 0, 0.</p>

Based on the shape of the distributions, what measure of center and measure of variability would you use to compare the distributions? Explain your reasoning.
Based on the data shown here, should the company use a dog or cat mascot? Explain your reasoning.

Show Solution

Mean and standard deviation. Since the distributions are approximately symmetric, the mean and standard deviation are the best choice to represent the data.
Sample responses:
- The company should use a dog mascot since the mean is greater.
- The company should use a cat mascot since the standard deviation shows that the images are more consistently clicked over 1,000 times while the dog images sometimes get fewer than 200 clicks.

august 2024 #9(2pt)

june 2024 #13(2pt)

june 2024 #16(2pt)

june 2025 #4(2pt)

june 2025 #12(2pt)

august 2025 #14(2pt)

january 2025 #16(2pt)

january 2025 #20(2pt)

january 2026 #28(2pt)

Section D Check

Section D Checkpoint

Unit 1 One Variable Statistics — Unit Plan