Unit 6 Associations In Data — Unit Plan

TitleTakeawaysStudent SummaryAssessment
Lesson 1
Organizing Data

Consider the data collected from pulling back a toy car and then letting it go forward. In the first table, the data may not seem to have an obvious pattern. The second table has the same data and shows that both values are increasing together.

Unorganized table:

distance pulled back (in) distance traveled (in)
6 23.57
4 18.48
10 38.66
8 31.12
2 13.86
1 8.95

Organized table:

distance pulled back (in) distance traveled (in)
1 8.95
2 13.86
4 18.48
6 23.57
8 31.12
10 38.66

A scatter plot of the data makes the pattern clear enough that we can estimate how far the car will travel when it is pulled back 5 in.

Patterns in data can sometimes become more obvious when reorganized in a table or when represented in scatter plots or other diagrams. If a pattern is observed, it can sometimes be used to make predictions. This is a scatter plot for this scenario:

Scatterplot.
A scatterplot. Horizontal, from 0 to 12, by 2’s, labeled distance pulled back, inches. Vertical, from 0 to 40, by 10’s, labeled distance traveled, inches. 6 data points.  Trend linearly upward and right.

Beach Cleaning (1 problem)

20 volunteers are cleaning the litter from a beach. The number of minutes each volunteer has worked and the number of meters left to clean on their section are recorded.

Here is a scatter plot that shows the data for each volunteer.

scatter plot of beach cleaning

  1. Label the vertical axis of the scatter plot.
  2. If a volunteer has worked 45 minutes, should they have closer to 60 meters or 120 meters of beach left to clean? Explain your reasoning.
Show Solution
  1. Sample response: beach left to clean (meters)
  2. 60 meters. Sample reasoning: When the time spent cleaning increases, the amount of beach left to clean tends to decrease. To keep in line with the rest of the data, the length left to clean should be closer to 60 meters than 120 meters.
Lesson 2
Plotting Data

Histograms show us how measurements of a single attribute are distributed. For example, a veterinarian saw 25 dogs in her clinic one week. She measured the height and weight of each dog.

This histogram shows how the weights of the dogs are distributed.

Histogram from 0 to 112 by 16’s. Dog weight, pounds. Beginning at 0 up to but not including 16, height of bar at each interval is 6, 8, 2, 5, 2, 1, 1.

This histogram shows how the heights of the dogs are distributed.

Histogram from 6 to 30 by 3’s. Dog height, inches. Beginning at 6 up to but not including 9, height of bar at each interval is 1, 4, 5, 2, 6, 3, 2, 2.

These histograms tell us how the weights of the dogs and how the heights of dogs were distributed. But, they do not give any evidence of a connection between a dog’s height and its weight.

Scatter plots allow us to investigate possible connections between two attributes. In this example, each plotted point corresponds to 1 of the 25 dogs, and its coordinates tell us the height and weight of that dog. Examination of the scatter plot allows us to see a connection between height and weight for the dogs. 

Scatterplot.
A scatterplot. Horizontal, from 6 to 30, by 3’s, labeled dog height, inches. Vertical, from 0 to 112, by 16’s, labeled dog weight, pounds. 24 data points.  Trend upward and to right.

Right Side Measurements (1 problem)

The table shows measurements of right hand length and right foot length for 5 people.

right hand length (cm) right foot length (cm)
person A 19 27
person B 21 30
person C 17 23
person D 18 24
person E 19 26

  1. Draw a scatter plot for the data.

    Blank grid. Horizontal axis, right hand length in centimeters, scale 0 to 30, by 5's. Vertical axis, right foot length in centimeters, scale 0 to 30, by 5's.

  2. Circle the point in the scatter plot that represents Person D’s measurements.

Show Solution

<p>Scatterplot.</p>

Lesson 3
What a Point in a Scatter Plot Means

Scatter plots show two measurements for each individual from a group. For example, this scatter plot shows the weight and height for each dog from a group of 25 dogs. 

Scatterplot.
A scatterplot. Horizontal, from 6 to 30, by 3’s, labeled dog height, inches. Vertical, from 0 to 112, by 16’s, labeled dog weight, pounds. 24 data points.  Trend upward and to right.

We can see that the tallest dogs are 27 inches, and that one of those tallest dogs weighs about 75 pounds while the other weighs about 110 pounds. This shows us that dog weight is not a function of dog height because there would be two different outputs for the same input. But we can see a general trend: taller dogs tend to weigh more than shorter dogs. There are exceptions. For example, there is a dog that is 18 inches tall and weighs over 50 pounds, and there is another dog that is 21 inches tall but weighs less than 30 pounds.

When we collect data by measuring attributes like height, weight, area, or volume, we call the data numerical data (or measurement data), and we say that height, weight, area, or volume is a numerical variable.

Quarterbacks (1 problem)

In football, a quarterback can be rated by a formula that assigns a number to how well they play.
A higher number generally means they played better.

Here are a table and scatter plot that show ratings and wins for quarterbacks who started every game in a season.

player quarterback rating number of wins
A 93.8 4
B 102.2 12
C 93.6 6
D 89 8
E 88.2 5
F 97 7
G 88.7 6
H 91.1 7
I 92.7 10
J 88 10
K 101.6 9
L 104.6 13
M 84.2 6
N 99.4 15
O 110.1 10
P 95.4 11
Q 88.7 11

A scatterplot.
A scatterplot. The horizontal axis is labeled “quarterback rating” and the numbers 80 through 120, in increments of 10, are indicated. The vertical axis is labeled “number of wins” and the numbers 0 through 20, in increments of 5, are indicated. The data are as follows: 84 point 2 comma 6. 88 comma 10. 88 point 2 comma 5. 88 point 7 comma 6. 88 point 7 comma 11. 89 comma 8. 91 point 1 comma 7. 92 point 7 comma 10. 93 point 6 comma 6. 93 point 8 comma 4. 95 point 4 comma 11. 97 comma 7. 99 point 4 comma 15. 101 point 6 comma 9. 102 point 2 comma 12. 104 point 6 comma 13. 110 point 1 comma 10.

  1. Circle the point in the scatter plot that represents Player K’s data.
  2. Which quarterback’s data are represented by the point farthest to the left?
  3. Player R is not included in the table. He has a quarterback rating of 99.4 and his team won 8 games. On the scatter plot, plot a point that represents Player R’s data.
Show Solution

<p>Scatterplot.</p>

  1. The circled point on the scatter plot
  2. Player M
  3. The added point to the scatter plot, plotted larger for visibility
Section A Check
Section A Checkpoint
Lesson 4
Fitting a Line to Data

Sometimes, we can use a linear function as a model of the relationship between two variables. For example, here is a scatter plot that shows heights and weights of 25 dogs together with the graph of a linear function which is a model for the relationship between a dog’s height and its weight.

A scatterplot, horizontal, dog height in inches, 6 to 30 by 3, vertical, 0 to 112 by 16. Same scatterplot as previous, this time with a line through 9 comma 0 and 27 comma 80.

For some dogs, we can see that the model does a good job of predicting the weight given the height. These correspond to points on or near the line. The model doesn’t do a very good job of predicting the weight given the height for the dogs whose points are far from the line.

For example, there is a dog that is about 20 inches tall and weighs a little more than 16 pounds. The model predicts that the weight would be about 48 pounds. We say that the model overpredicts the weight of this dog. There is also a dog that is 27 inches tall and weighs about 110 pounds. The model predicts that its weight will be a little less than 80 pounds. We say the model underpredicts the weight of this dog. For most of the dogs in this data set, though, the model does a good job of predicting the weight from the height.

Sometimes a data point is far away from the other points or doesn’t fit a trend that all the other points fit. We call these outliers

A 1-Foot Foot (1 problem)

Here is a scatter plot that shows lengths and widths of 20 left feet, together with the graph of a model of the relationship between foot length and width.

A scatterplot with line.
A scatterplot. Horizontal, from 20 to 32, by 2's, labeled foot length in centimeters. Vertical, from 7 to 12, by 1’s, labeled foot width in centimeters. 20 dots trend upward and to the right. Line drawn, trends linearly upward and right with 11 dots above lie and 9 below. No dots lie on the line. The line begins at about point 21 point 9 comma 9 and ends at about 31 point 25 comma 11 point 5.

  1. Draw a box around the point that represents the foot with length closest to 29 cm.
  2. What is the approximate width of this foot?
  3. What width does the model predict for a foot with length 29 cm?
Show Solution
  1. A box is drawn around the point at approximately (29.1,10.4)(29.1, 10.4).
  2. About 10.4 cm
  3. About 11.1 cm
Lesson 5
Describing Trends in Scatter Plots

When a linear function fits data well, we say there is a “linear association” between the variables. For example, the relationship between height and weight for 25 dogs with the linear function whose graph is shown in the scatter plot.

We say there is a positive association between dog height and dog weight because knowledge about one variable helps predict the other variable, and when one variable increases, the other tends to increase as well.

A scatterplot, horizontal, dog height in inches, 6 to 30 by 3, vertical, 0 to 112 by 16. Same scatterplot as previous, this time with a line through 9 comma 0 and 27 comma 80.

What do you think the association between the weight of a car and its fuel efficiency is?

We say that there is a negative association between fuel efficiency and weight of a car because knowledge about one variable helps predict the other variable, and when one variable increases, the other tends to decrease.

Scatterplot, weight, kilograms, 1000 to 2500 by 250, fuel efficiency, miles per gallon, 14 to 32 by 2. Points are arranged close to the line through 1100 comma 28 down and right through 2300 comma 14.

This Is One Way to Do It (1 problem)
  1. Elena said, “I think this line is a good fit because half of the points are on one side of the line and half of the points are on the other side.” Do you agree? Explain your reasoning.

    Scatterplot.
    A scatterplot. Horizontal, from 0 to 12, by 2’s. Vertical, from 0 to 80, by 20’s. Data trends downward and to right. Line of best fit drawn, goes slightly upward and to right. 10 data points above and below line.

  2. Noah said, “I think this line is a good fit because it passes through the leftmost point and the rightmost point.” Do you agree? Explain your reasoning.

    Scatterplot.
    A scatterplot. Horizontal, from 0 to 12, by 2’s. Vertical, from 0 to 80, by 20’s. Data trends downward and to right. Line of best fit drawn, goes downward and to right. 14 data points below line, 4 points above line, and two points on line.

Show Solution
  1. Disagree. Sample response: The line is not a good fit because the data show a negative association, but the line has a positive slope.
  2. Disagree. Sample responses: The line is not a good fit because most of the points are below it and the trend of the scatter plot is steeper than the slope of the graph.
Lesson 6
The Slope of a Fitted Line

Here is a scatter plot that we have seen before. As noted earlier, we can see from the scatter plot that taller dogs tend to weigh more than shorter dogs.

Another way to say it is that weight tends to increase as height increases.

When we have a positive association between two variables, an increase in one means there tends to be an increase in the other.

Scatterplot.
A scatterplot. Horizontal, from 6 to 30, by 3’s, labeled dog height, inches. Vertical, from 0 to 112, by 16’s, labeled dog weight, pounds. 24 data points.  Trend upward and to right.

We can quantify this tendency by fitting a line to the data and finding its slope.

For example, the equation of the fitted line is w=4.27h37w = 4.27h -37, where hh is the height of the dog and ww is the predicted weight of the dog.

The slope is 4.27, which tells us that for every 1-inch increase in dog height, the weight is predicted to increase by 4.27 pounds.

A scatterplot, horizontal, dog height in inches, 6 to 30 by 3, vertical, 0 to 112 by 16. Same scatterplot as previous, this time with a line through 9 comma 0 and 27 comma 80.

In our example of the fuel efficiency and weight of a car, the slope of the fitted line shown is -0.01.

Scatterplot, weight, kilograms, 1000 to 2500 by 250, fuel efficiency, miles per gallon, 14 to 32 by 2. Points are arranged close to the line through 1100 comma 28 down and right through 2300 comma 14.

This tells us that for every 1-kilogram increase in the weight of the car, the fuel efficiency is predicted to decrease by 0.01 mile per gallon (or, after multiplying both values by 100, every 100-kilogram increase corresponds to a predicted decrease of 1 mpg). 

When we have a negative association between two variables, an increase in one means there tends to be a decrease in the other.

Trends in the Price of Used Cars (1 problem)

Here is a scatter plot that shows the years when some used cars were made and their prices in 2016 together with the graph of a linear model for the relationship between year and price in dollars.

Scatterplot, x, year, 2006 to 2016 by 2, y, price, 6000 to 21000 by 3000. Points begin near 2007 comma and trend up and to the right.  A line goes through 2007 comma 9000 and 2014 comma 16,500.

  1. Is the slope positive or negative?
  2. Which of these values is closest to the slope of the linear model shown in the scatter plot?
    • 1,000
    • 3,000
    • -1,000
    • -3,000
  3. Use the value you selected to describe the meaning of the slope in this context.
Show Solution
  1. The slope is positive, because as the year of the car increases, the price tends to increase.
  2. 1,000
  3. The model predicts that when a car is made 1 year later, the price is 1,000 dollars higher.
Lesson 7
Observing More Patterns in Scatter Plots

Sometimes a scatter plot shows an association that is not linear:

Scatterplot, x, 0 to 12 by 3, y, 0 to 30 by 6. Points begin near 1 comma 24 and trend down and to the right until about 6 comma 2, and then trend up and to the right to about 11 comma 25.

In this scatter plot, the data initially shows a negative trend then later a positive trend. Because the variables appear to be associated, but not in a linear way, we call this a non-linear association. In later grades, you will study functions that can be models for non-linear associations.

Sometimes in a scatter plot we can see separate groups of points.

A scatterplot with two groups of points.
A scatterplot with two groups of points.  The first begins near the origin and trends up and to the right toward 8 comma 13. Second group begins near 3 comma 25 and trends up and right toward 9 comma 45.

A scatterplot with two groups of points.
A scatterplot with two groups of points.  The first begins near the origin and trends up and to the right toward 8 comma 13. Second group begins near 3 comma 25 and trends up and right toward 9 comma 45. Each group is circled. 

We call these groups “clusters.” Clusters often appear when multiple patterns are present within the data. There may be subgroups within the overall data set that affect the variables.

Make Your Own Scatter Plot (1 problem)
  1. Draw a scatter plot that shows a positive linear association and clustering.

    Blank coordinate grid

  2. Draw a scatter plot that shows a negative non-linear association and no clustering.

    Blank coordinate grid

Show Solution

Sample responses:

  1.  
    <p>Scatterplot.</p>
  2.  
    <p>Scatterplot.</p>
Lesson 8
Analyzing Bivariate Data

People often collect data in two variables to investigate possible associations between two numerical variables and use the connections that they find to predict more values of the variables.

Data analysis usually follows these steps:

  1. Collect data
  2. Organize and represent the data, then look for an association
  3. Identify any outliers and try to explain why these data points are exceptions to the trend that describes the association
  4. Find a function that fits the data well

Although computational systems can help with data analysis by graphing the data, finding a function that might fit the data, and using that function to make predictions, it is important to understand the process and think about what is happening. A computational system may find a function that does not make sense or use a line when the situation suggests that a different model would be more appropriate.

Drawing a Line (1 problem)
  1. Draw a line on the scatter plot that fits the data well.

    A scatterplot.
    Scatterplot. Horizontal from 0 to 10, by 2’s. Vertical from negative 0 to 10, by 2’s. 8 dots clustered in upper left side of graph spread horizontally between 0 and 2 point 5 and vertically from 7 to about 8. 11 dots spread horizontally between 6 and 10 and vertically from 3 to 6.

  2. A new point will be added to the scatter plot with x=4x = 4. What do you predict for the yy-value of this point if it follows the association of the data already in the scatter plot?
  3. A new point will be added to the scatter plot with x=10x = 10. What is an example of a yy-value of this point if it is considered an outlier?
Show Solution

Sample responses:

  1.  

    <p>Scatterplot with line of best fit.</p>

  2. 6
  3. 10
Section B Check
Section B Checkpoint
Unit 6 Assessment
End-of-Unit Assessment