Given a box plot, it is always possible to calculate the mean of the data.
B.
Given a box plot, it is always possible to find the median of the data.
C.
Given a box plot, it is always possible to construct a corresponding dot plot.
D.
Given a dot plot, it is always possible to construct a corresponding box plot.
E.
Given a histogram, it is always possible to construct a corresponding box plot.
Answer:B, D
Teaching Notes
A vital fact to know here is that only the dot plot can be used to reconstruct the entire data set. Given a histogram or box plot, only certain information is readily known.
Students who select Choice A and do not select Choice B may be confused about median versus mean as it applies to the construction of a box plot. Students who select Choice C may think that the corresponding dot plot needs to have only the five number-summary, instead of the full original data set. Students who don’t select Choice D may not be thinking about how a box plot is constructed—even though the dot plot does not directly include the information in a box plot, it can still be constructed. Students who select Choice E may have a significant misconception about what a histogram represents, notably that it indicates only ranges of information, not specific values.
2.
Here’s a dot plot of a data set.
Which statement is true about the mean of the data set?
A.
The mean is less than 5.
B.
The mean is equal to 5.
C.
The mean is greater than 5.
D.
There is not enough information to determine the mean.
Answer:
The mean is greater than 5.
Teaching Notes
The "balancing" interpretation of mean is very useful here. There are 4 points that are each 1 less than 5 and 1 point that is 8 more than 5. This shows that the line should balance at a point greater than 5, so the mean should be greater than 5.
Students who select Choice A may think that, because there are more values less than 5, the mean must be less than 5. Students who select Choice B may not be thinking quantitatively enough to understand where to balance the distribution. Students who select Choice D may think that additional information is needed to compute the mean.
3.
The air quality was tested in many office buildings in two cities. The results of the testing are shown in these box plots.
Double box plot from 0 to 65 by 5's. Parts per million. Top box plot labeled city P. Bottom box plot labeled city Q. Top box plot whisker from 10 to 15. Box from 15 to 35 with vertical line at 30. Whisker from 35 to 40. Bottom box plot whisker from 5 to 20. Box from 20 to 30 with vertical line at 25. Whisker from 30 to 60.
A level of less than 50 parts per million is considered healthy. A level of 50 or more parts per million is considered unhealthy.
Select all the statements that must be true.
A.
The lowest recorded measurement was in city Q.
B.
All buildings tested in city P are in the healthy range.
C.
The mean for city P is greater than the mean for city Q.
D.
The range for city Q is greater than the range for city P.
E.
The median for city P is greater than the median for city Q.
Answer:A, B, D, E
Teaching Notes
Students who don’t select Choice A are looking at the box instead of the whiskers. Students who don’t select Choice B may have misinterpreted “50 parts per million” as a division, ignoring the label of the box plot. Students who select Choice C may be attempting to judge the mean by the overall location of the box plots, but no such judgment can be made. Students who don’t select Choice D are looking at the IQR instead of the range. Students who select Choice C instead of Choice E may think box plots indicate means instead of medians.
4.
This box plot displays information about the number of text messages that some students sent to their parents in one day.
A box plot for “number of texts.” The numbers 0 through 50, in increments of 5, are indicated. The five number summary for the box plot is as follows: Minimum value, 0. Maximum value, 50. Q1, 10. Q2, 14. Q3, 20.
What is the median number of texts sent by students?
What is the IQR (interquartile range)?
Is this data set symmetric? Explain how you know.
Answer:
14 text messages (also accept anything between 13 and 14.5).
10 text messages (the IQR is 20−10, which is 10).
No. The top quartile (or top whisker) is much wider than the bottom quartile.
Tier 1 response:
Accurate, correct work.
Correct answers to all three questions, including a correct explanation of why the data set is not symmetric.
Acceptable errors: Claim that the data set is not symmetric because the right side of the box is wider than the left side, without reference to whiskers.
Tier 2 response:
Work shows general conceptual understanding and mastery, with some errors.
Sample errors: Incorrect median; incorrect IQR; incorrect answer or explanation on data symmetry question, including a general statement that the box plot is not symmetric (not specific enough).
Tier 3 response:
Significant errors in work demonstrate lack of conceptual understanding or mastery.
Sample errors: Two or more error types from Tier 2 response.
Teaching Notes
Students use a box plot to make conclusions about a data set, including the median, the IQR, and the shape of the distribution. Students may struggle to identify the shape of the distribution without the actual data, but the box plot gives enough information to answer the last question.
5.
Two groups went bowling. Here are the scores from each group.
Group A
70
80
90
100
110
130
190
Group B
50
100
107
110
120
140
150
Draw two box plots, one for the data in each group.
Which group shows greater variability?
Answer:
Group A shows greater variability. It has a wider range (120 to Group B’s 100), and a wider IQR (50 to Group B’s 40).
Tier 1 response:
Accurate, correct work.
Both box plots are drawn correctly, correctly stating that Group A shows greater variability.
Tier 2 response:
Work shows general conceptual understanding and mastery, with some errors.
Sample errors: 1 or 2 types of minor errors in creating box plots (incorrect placement of median, quartiles, max or min, badly drawn box); incorrectly stating Group B shows greater variability or omitting question.
Tier 3 response:
Significant errors in work demonstrate lack of conceptual understanding or mastery.
Sample errors: More than 2 types of minor errors in creating box plots; major errors in creating box plots, such as not using 5 numbers to generate box plot; creating only one box plot.
Teaching Notes
Because box plots are being constructed, students are very likely to use the IQR or range as measures of variability, but Group A also has the higher MAD if students somehow decide to compute it.
6.
Ten students each attempted 10 free throws. This list shows how many free throws each student made.
8
5
6
6
4
9
7
6
5
9
What is the median number of free throws made?
What is the IQR (interquartile range)?
Answer:
6 free throws. (The ordered list is 4,5,5,6,6,6,7,8,9,9. The two middle terms in the ordered list are both 6.)
3 free throws. (The first half of the data is 4,5,5,6,6; its median is 5. The second half of the data is 6,7,8,9,9; its median is 8. The IQR is 3, since 8 −5=3.)
Teaching Notes
Watch for students attempting to answer the question without first sorting the data (those students will give a median of 6.5). Also watch for students excluding the center 6 values from the quartile calculation, which leads to an incorrect (larger) IQR.
7.
Jada asked some students at her school how many hours they spent watching television last week, to the nearest hour. Here are a box plot and a histogram for the data she collected.
Box plot:
A box plot for “time in hours.” The numbers 0 through 26, in increments of two, are indicated. The five-number summary for the box plot is as follows: Minimum value, 0. Maximum value, 26. Q1, 2. Q2, 5. Q3, 10.
Histogram:
A histogram: the horizontal axis is labeled “time in hours,” and the numbers 0 through 30, in increments of 5, are indicated. On the vertical axis, the numbers 0 through 40, in increments of 5, are indicated. The data represented by the bars are as follows: From 0 up to 5 hours, 40; From 5 up to 10 hours, 30; From 10 up to 15 hours, 20; From 15 up to 20 hours, 5; From 20 up to 25 hours, 3; From 25 up to 30 hours, 2
About how many students did Jada ask?
Is the mean or the median a more appropriate measure of center for this data set? Explain your reasoning.
Can Jada use these data displays to find the exact median? Explain how you know.
Can Jada use these data displays to find the exact mean?
What would be an appropriate measure of variability for this data set? Find or estimate its value.
Answer:
Jada asked about 100 students.
The median is more appropriate because the data is not symmetric.
Yes, the box plot gives the exact median, 5 hours.
No
Sample response: The IQR (interquartile range) is appropriate because the median is being used as a measure of center. The box plot gives the IQR of 8 hours because 10−2=8.
Tier 1 response:
Accurate, correct work.
Correct answer to each question, description of why IQR is an appropriate measure of spread, correct IQR.
Acceptable errors: Mistake in determining median or IQR caused by a misreading of the box plot.
Tier 2 response:
Work shows good conceptual understanding and mastery, with minor errors.
Sample errors: Incorrect response for histogram total, larger than 6; stating that the data is symmetric; attempt to calculate precise mean; incorrect or missing IQR calculation.
Acceptable errors: Incorrect MAD estimation, given (incorrect) statement that data is symmetric.
Tier 3 response:
Work shows a developing but incomplete conceptual understanding, with significant errors.
Sample errors: Two or more error types from Tier 2 response; incorrect response for histogram total, 6 or fewer; incorrect median; invalid use of box plot to determine mean.
Tier 4 response:
Work includes major errors or omissions that demonstrate a lack of conceptual understanding and mastery.
Sample errors: Three or more error types from Tier 2 response; two or more error types from Tier 3 response; multiple omitted parts.
Teaching Notes
This question is about the limitations of the histogram and box plot, which provide only partial information about a distribution. Notably, one display may be more useful than another depending on the question asked about the data.