Comparing Data Sets

Student Summary

To compare data sets, it is helpful to look at the measures of center and measures of variability. The shape of the distribution can help choose the most useful measure of center and measure of variability.

When distributions are symmetric or approximately symmetric, the mean is the preferred measure of center and should be paired with the standard deviation as the preferred measure of variability. When distributions are skewed or when outliers are present, the median is usually a better measure of center and should be paired with the interquartile range (IQR) as the preferred measure of variability.

Once the appropriate measure of center and measure of variability are selected, these measures can be compared for data sets with similar shapes.

For example, let’s compare the number of seconds it takes football players to complete a 40-yard dash at two different positions. First, we can look at a dot plot of the data to see that the tight-end times do not seem distributed symmetrically, so we should probably find the median and IQR for both sets of data to compare information.

<p>Dot plot from 4 point 25 to 5 point 75 by  point 25’s. Wide receiver times in seconds. Beginning at 4 point 25 up to but not including 4  point 5, number of dots in each interval is 12, 11, 2, 0, 0, 0.<br>
 </p>

<p>Dot plot from 4 point 25 to 5 point 75 by point 25’s. Tight end times in seconds. Beginning at 4 point 25 up to but not including 4 point 5, number of dots in each interval is 0, 10, 6, 4, 3, 1.<br>
 </p>

The median and IQR could be computed from the values, but can also be determined from a box plot.

<p>Box plot.</p>
Box plot from 4 point 25 to 5 point 75 by  point 25’s. Wide receiver time in seconds. Whisker from 4 point 31 to 4 point 405. Box from 4 point 405 to 4 point 665 with vertical line at 4 point 5. Whisker from 4 point 665 to 4 point 8.

<p>Box plot for tight end times.</p>
Box plot from 4 point 25 to 5 point 75 by point 25’s. Tight end times in seconds. Whisker from 4 point 56 to 4 point 685. Box from 4 point 685 to 5 point 225 with vertical line at 4 point 87. Whisker from 5 point 255 to 5 point 7.

This shows that the tight-end times have a greater median (about 4.9 seconds) compared to the median of wide-receiver times (about 4.5 seconds). The IQR is also greater for the tight-end times (about 0.5 seconds) compared to the IQR for the wide-receiver times (about 0.25 seconds).

This means that the tight ends tend to be slower in the 40-yard dash when compared to the wide receivers. The tight ends also have greater variability in their times. Together, this can be taken to mean that, in general, a typical wide receiver is faster than a typical tight end is, and the wide receivers tend to have more similar times to one another than the tight ends do to one another.

Visual / Anchor Chart

Standards

Building On
HSS-ID.A.1

S-ID.1

S-ID.1

S-ID.1

HSS-ID.A.1

S-ID.1

S-ID.1

S-ID.1

Addressing
HSS-ID.A.1

S-ID.1

S-ID.1

S-ID.1

HSS-ID.A.2

S-ID.2

S-ID.2

S-ID.2

HSS-ID.A.2

S-ID.2

S-ID.2

S-ID.2

Building Toward
HSS-ID.A.2

S-ID.2

S-ID.2

S-ID.2