Unit 8 Data Sets And Distributions — Unit Plan

TitleTakeawaysStudent SummaryAssessment
Lesson 4
Dot Plots

We often collect and analyze data because we are interested in learning what is “typical,” or what is common and can be expected in a group.

Sometimes it is easy to tell what a typical member of the group is. For example, we can say that a typical shape in this set is a large circle.

A set that consists of 17 shapes. There are 10 large circles, 1 medium circle, 3 small circles, 1 large square, and 2 small squares.

Just looking at the members of a group doesn’t always tell us what is typical, however. For example, if we are interested in the side length typical of squares in this set, it isn’t easy to do so just by studying the set visually.

A set that consists of 18 squares of varying side lengths.

In a situation like this, it is helpful to gather the side lengths of the squares in the set and look at their distribution, as shown in this dot plot.

A dot plot for "side lengths in centimeters".
A dot plot for "side lengths in centimeters". The numbers 1 through 8 are indicated. The data are as follows: 2 centimeters, 4 dots. 3 centimeters, 5 dots. 4 centimeters, 3 dots. 5 centimeters, 3 dots. 6 centimeters, 2 dots. 7 centimeters, 1 dot.

We can see that squares with 3 centimeter sides are the most common and many others are about the same size. That means we could say that side lengths of about 3 centimeters are typical of squares in this set.

Family Size (1 problem)

A group of students was asked, “How many children are in your family?” The responses are displayed in the dot plot.

A dot plot, number of children, 0 through 6 by ones.  Starting at 0, the number of dots above each increment is 0, 5, 7, 5, 2, 1, 0.

  1. How many students responded to the question?
  2. What percentage of the students have more than one child in the family?
  3. Write a sentence that describes the distribution of the data shown on the dot plot. Use a description of the center and spread in your description.
Show Solution
  1. There are 20 dots and each corresponds to one student in the group.
  2. 75%. 15 out of 20 students answered that there are 2 or more children in the family.
  3.  Sample response: A typical number of children for this group of families is around 2 because the center is around 2.5 or so, but some families had many more children than others. The distribution is not very spread out with most families having 1–3 children and only a few of them having more.
Lesson 6
Interpreting Histograms

In addition to using dot plots, we can also represent distributions of numerical data using histograms.

Here is a dot plot that shows the weights, in kilograms, of 30 dogs, followed by a histogram that shows the same distribution.

A dot plot for dog weights in kilograms
A dot plot, the numbers 10 through 35, in increments of 5, are indicated. The 30 data values are as follows: 10 kilograms, 1 dot. 11 kilograms, 1 dot. 12 kilograms, 2 dots. 13 kilograms, 1 dot. 15 kilograms, 1 dot. 16 kilograms, 2 dots. 17 kilograms, 1 dot. 18 kilograms, 2 dots. 19 kilograms, 1 dot. 20 kilograms, 3 dots. 21 kilograms, 1 dot. 22 kilograms, 3 dots. 23 kilograms, 1 dot. 24 kilograms, 2 dots. 26 kilograms, 2 dots. 28 kilograms, 1 dot. 30 kilograms, 1 dot. 32 kilograms, 2 dots. 34 kilograms, 2 dots.

A histogram for dog weights in kilograms.
A histogram, the horizontal axis is labeled “dog weights in kilograms” and the numbers 10 through 35, in increments of 5, are indicated. On the vertical axis the numbers 0 through 10, in increments of 2, are indicated. The data represented by the bars are as follows: Weight from 10 up to 15, 5. Weight from 15 up to 20, 7. Weight from 20 up to 25, 10. Weight from 25 up to 30, 3. Weight from 30 up to 35, 5.

In a histogram, data values are placed in groups, or “bins,” of a certain size, and each group is represented with a bar. The height of the bar tells us the frequency for that group.

For example, the height of the tallest bar is 10, and the bar represents weights from 20 to less than 25 kilograms, so there are 10 dogs whose weights fall in that group. Similarly, there are 3 dogs that weigh anywhere from 25 to less than 30 kilograms.

Notice that the histogram and the dot plot have a similar shape. The dot plot has the advantage of showing all of the data values, but the histogram is easier to draw and to interpret when there are a lot of values or when the values are all different.

Here is a dot plot showing the weight distribution of 40 dogs. The weights were measured to the nearest 0.1 kilogram instead of the nearest kilogram.

A dot plot for “dogs weights on kilograms.”
A dot plot for “dog weights in kilograms”. The numbers 8 through 36, in increments of 2, are indicated. There is 1 dot on each of the following values: 10 kilograms,, 11 kilograms, 11.3 kilograms, 12 kilograms, 12.1 kilograms, 13 kilograms, 14.7 kilograms, 15 kilograms, 15.1 kilograms, 16 kilograms, 16.5 kilograms, 17 kilograms, 18 kilograms, 18.5 kilograms, 19 kilograms, 19.1 kilograms, 20 kilograms, 20.2 kilograms, 20.4 kilograms, 21 kilograms, 21.5 kilograms, 22.6 kilograms, 22.7 kilograms, 22.8 kilograms, 23.2 kilograms, 23.4 kilograms, 24 kilograms, 24.9 kilograms, 26 kilograms, 26.1 kilograms, 26.7 kilograms, 28 kilograms, 28.4 kilograms, 30 kilograms, 31.5 kilograms, 32 kilograms, 32.1 kilograms, 33.5 kilograms, 34 kilograms, 34.4 kilograms.

Here is a histogram showing the same distribution.

Histogram from 10 to 35 by 5's. Dog weights in kilograms.

In this case, it is difficult to make sense of the distribution from the dot plot because the precision of the measurement means the dots are distinct and so close together. The histogram of the same data set does a much better job showing the distribution of weights by grouping similar values to show an overall trend, even though we can’t see the individual data values.

Rain in Miami (1 problem)

Here is the average amount of rainfall, in inches, for each month in Miami, Florida.

month rainfall (inches) month rainfall (inches)
January 1.61 July 6.5
February 2.24 August 8.9
March 2.99 September 9.84
April 3.14 October 6.34
May 5.35 November 3.27
June 9.69 December 2.05
  1. Complete the frequency table and use it to make a histogram.

    rainfall
    (inches)
    frequency
    0–2 1
    2–4 5
    4–6
    6–8
    8–10

    A blank grid, horizontal axis labeled rainfall in inches, boxes from 0 to 11 by ones, labeled 0 to 10 by twos. Vertical axis 0 to 7 by ones.

  2. What can you say about the center of this distribution using the histogram?
Show Solution
  1. rainfall (inches) frequency
    0–2 1
    2–4 5
    4–6 1
    6–8 2
    8–10 3

    <p>A histogram.</p>

  2. Sample response: The center of the distribution appears to be between 4 and 6 inches of rain.
Section B Check
Section B Checkpoint
Lesson 9
Mean

Sometimes a general description of a distribution does not give enough information, and a more precise way to talk about center or spread would be more useful. The mean, or average, is a number we can use for the center to summarize a distribution.

We can think about the mean in terms of “fair share” or “leveling out.” That is, a mean can be thought of as a number that each member of a group would have if all the data values were combined and distributed equally among the members.

For example, suppose there are 5 containers, each of which has a different amount of water: 1 liter, 4 liters, 2 liters, 3 liters, and 0 liters.

5 diagrams, each composed of 4 squares, some colored blue. From left to right, the number of blue squares in each diagram are 1, 4, 2, 3, 0.
There are 5 identical tape diagrams that are each partitioned into 4 equal parts. The first diagram has 1 part shaded. The second diagram has 4 parts shaded. The third diagram has 2 parts shaded. The fourth diagram has 3 parts shaded. The fifth diagram has no parts shaded.

To find the mean, first we add up all of the values. We can think of this as putting all of the water together: 1+4+2+3+0=101+4+2+3+0=10.

  

A tape diagram partitioned into 10 equal parts. All 10 parts are shaded.

  

To find the “fair share,” we divide the 10 liters equally into the 5 containers: 10÷5=210\div 5 = 2.

There are 5 identical tape diagrams each partitioned into 4 equal parts. Each diagram has 2 parts shaded.

The mean is useful when each unit of measurement has equal importance. For example, it may make sense to find the mean score of assignments of the same importance, such as all quizzes. If some grades are more important, it may not make sense to find the mean. For example, it may not make sense to find the mean score when there are 6 short homework assignments and one major essay.

Suppose the quiz scores of a student are 70, 90, 86, and 94. We can find the mean (or average) score by finding the sum of the scores (70+90+86+94=340)(70+90+86+94=340) and dividing the sum by four (340÷4=85)(340 \div 4 = 85). We can then say that the student scored, on average, 85 points on the quizzes.

In general, to find the mean of a data set with nn values, we add all of the values and divide the sum by nn.

Finding Means (1 problem)
  1. Last week, the daily low temperatures for a city, in degrees Celsius, were 5, 8, 6, 5, 10, 7, and 1. What was the average low temperature? Show your reasoning.
  2. The mean of four numbers is 7. Three of the numbers are 5, 7, and 7. What is the fourth number? Explain your reasoning.
Show Solution
  1. 6 degrees Celsius. The sum of the temperatures divided by the total number of recorded temperatures is (5+ 8+ 6+ 5+ 10+ 7+ 1)÷7=6(5+ 8+ 6+ 5+ 10+ 7+ 1)\div 7 = 6.
  2. 9. Sample reasoning: The 4 numbers must be distributed evenly around 7. Because 2 of the numbers are 7, and the third number is two less than 7, the fourth number must be 2 more than 7.
Section C Check
Section C Checkpoint
Lesson 13
Median

The median is another measure of center for a distribution. It is the middle value in a data set when values are listed in order. The number of values less than or equal to the median is the same as the number of values that are greater than or equal to the median.

To find the median, we order the data values from least to greatest and find the number in the middle.

Suppose we have 5 dogs whose weights, in pounds, are shown in the table. The median weight for this group of dogs is 32 pounds because three dogs weigh less than or equal to 32 pounds and three dogs weigh greater than or equal to 32 pounds.

20

25

32

40

55

Now suppose we have 6 cats whose weights, in pounds, are listed here. Notice that there are 2 values in the middle: 7 and 8.

4

6

7

8

10

10

The median weight must be between 7 and 8 pounds, because half of the cats weigh less than or equal to 7 pounds, and half of the cats weigh greater than or equal to 8 pounds.

When there are even numbers of values, we take the number exactly in between the two middle values. In this case, the median cat weight is 7.5 pounds because (7+8)÷2=7.5(7+8)\div 2=7.5.

Practicing the Piano (1 problem)

Jada and Diego are practicing the piano for an upcoming rehearsal. The number of minutes each of them practiced in the past few weeks are listed. 

Jada's practice times:

  • 10
  • 10
  • 20
  • 15
  • 25
  • 25
  • 8
  • 15
  • 20
  • 20
  • 35
  • 25
  • 40

Diego's practice times:

  • 25
  • 10
  • 15
  • 30
  • 15
  • 20
  • 20
  • 25
  • 30
  • 45
  1. Find the median of each data set.
  2. Explain what the medians tell you about Jada's and Diego's piano practice.
Show Solution
  1. Jada's median: 20 minutes. Diego's median: 22.5 minutes.
  2. Sample response: Half of Jada's practices are 20 minutes or shorter and the other half of her practices are 20 minutes or longer. Half of Diego's practices are 22.5 minutes or shorter, and the other half are 22.5 minutes or longer.
Lesson 14
Comparing Mean and Median

Both the mean and the median are ways of measuring the center of a distribution. They tell us slightly different things, however.

The dot plot shows the number of stickers on 30 pages. The mean number of stickers is 21 (marked with a triangle). The median number of stickers is 20.5 (marked with a diamond).

&lt;p&gt;A dot plot for "stickers on a page".&lt;/p&gt;<br>
 
<p>A dot plot for stickers on a page. The numbers 8 through 34, in increments of 2, are indicated. A diamond is indicated at 20.5 stickers and a triangle is indicated at 21 stickers. Data are as follows: 9 stickers, 1 dot; 10 stickers, 1 dot; 11 stickers, 2 dots; 12 stickers, 1 dot; 14 stickers, 1 dot; 16 stickers, 2 dots; 17 stickers, 1 dot; 18 stickers, 2 dots; 19 stickers, 1 dot; 20 stickers, 3 dots; 21 stickers, 1 dot; 22 stickers, 3 dots; 23 stickers, 1 dot; 24 stickers, 2 dots; 26 stickers, 2 dots; 28 stickers, 1 dot; 30 stickers, 1 dot; 32 stickers, 2 dots; 33 stickers, 1 dot; 34 stickers, 1 dot.</p>  

The mean tells us that if the number of stickers were distributed so that each page has the same number, then each page would have 21. We could also think of 21 stickers as a balance point for the number of stickers on all of the pages in the set. 

The median tells us that half of the pages have more than 20.5 stickers and half have less than 20.5 stickers. In this case, both the mean and the median could describe a typical number of stickers on a page because they are fairly close to each other and to most of the data points.

Here is a different set of 30 pages with stickers. It has the same mean as the first set, but the median is 23 stickers.

&lt;p&gt;A dot plot for “stickers on a page.” &lt;/p&gt;<br>
 
<p>A dot plot for “stickers on a page.” The numbers 8 through 34, in increments of 2, are indicated. A triangle is indicated at 21 stickers, and a diamond is indicated at 23 stickers. The data are as follows: 9 stickers, 1 dot; 10 stickers, 1 dot; 13 stickers, 1 dot; 14 stickers, 1 dot; 16 stickers, 1 dot; 17 stickers, 1 dot; 19 stickers, 1 dot; 20 stickers, 2 dots; 21 stickers, 2 dots; 22 stickers, 3 dots; 23 stickers, 6 dots; 24 stickers, 5 dots; 25 stickers, 4 dots; 26 stickers, 1 dot.</p>  

In this case, the median is closer to where most of the data points are clustered and is therefore a better measure of center for this distribution. That is, it is a better description of the typical number of stickers on a page. The mean number of stickers is influenced (in this case, pulled down) by a handful of pages with very few stickers, so it is farther away from most data points.

In general, when a distribution is symmetrical or approximately symmetrical, the mean and median values are close. But when a distribution is not roughly symmetrical, the two values tend to be farther apart.

Which Measure of Center to Use? (1 problem)

For each dot plot or histogram:

  1. Predict if the mean is greater than, less than, or approximately equal to the median. Explain your reasoning.
  2. Which measure of center—the mean or the median—better describes a typical value for the distributions?

Heights of 50 basketball players
&lt;p&gt;Histogram from 66 to 80 by 2’s. Height in inches. Beginning at 66 up to but not including 68, height of bar at each interval is 12, 3, 14, 18, 6, 4, 1.&lt;/p&gt;<br>
 

Backpack weights of 55 sixth-grade students
&lt;p&gt;Dot plot from 0 to 16 by 2’s. Backpack weight in kilograms. &lt;/p&gt;<br>
 
<p>Dot plot from 0 to 16 by 2’s. Backpack weight in kilograms. Beginning at 0, number of dots above each increment from 0 to 9 is 0, 7, 9, 12, 7, 6, 3, 3, 2, 1. 1 dot above 16.</p>  

Ages of 30 people at a family dinner party
&lt;p&gt;Histogram from 5 to 50 by 5’s. Age in years. Beginning at 5 up to but not including 10, height of bar at each interval is 2, 3, 1,1,2,3,2,5,11.&lt;/p&gt;<br>
 

Show Solution

Sample responses:

  1. Player heights
    1. The mean would be approximately equal to the median, because the data are roughly symmetric.
    2. Since I think the values would be pretty close, either the mean or the median would describe a typical height pretty well.
  2. Backpack weights
    1. The mean would be higher than the median. The value of 16 kilograms would bring the mean up and move it away from the center of the data.
    2. The median would better describe a typical backpack weight, since that value would lie in the center of the large cluster of data points.
  3. People's ages
    1. The mean would be lower than the median, because even though a large fraction of the people at the dinner party are 40 or older, the ages of the people that span from 5 to 40 would bring the average age down.
    2. The median would better describe the center of the distribution of around 40–45 years old.
Section D Check
Section D Checkpoint
Unit 8 Assessment
End-of-Unit Assessment