Unit 6 Associations In Data — Unit Plan

TitleTakeawaysStudent SummaryAssessment
Lesson 1
Organizing Data

Consider the data collected from pulling back a toy car and then letting it go forward. In the first table, the data may not seem to have an obvious pattern. The second table has the same data and shows that both values are increasing together.

Unorganized table:

distance pulled back (in) distance traveled (in)
6 23.57
4 18.48
10 38.66
8 31.12
2 13.86
1 8.95

Organized table:

distance pulled back (in) distance traveled (in)
1 8.95
2 13.86
4 18.48
6 23.57
8 31.12
10 38.66

A scatter plot of the data makes the pattern clear enough that we can estimate how far the car will travel when it is pulled back 5 in.

Patterns in data can sometimes become more obvious when reorganized in a table or when represented in scatter plots or other diagrams. If a pattern is observed, it can sometimes be used to make predictions. This is a scatter plot for this scenario:

Scatterplot.
A scatterplot. Horizontal, from 0 to 12, by 2’s, labeled distance pulled back, inches. Vertical, from 0 to 40, by 10’s, labeled distance traveled, inches. 6 data points.  Trend linearly upward and right.

Beach Cleaning (1 problem)

20 volunteers are cleaning the litter from a beach. The number of minutes each volunteer has worked and the number of meters left to clean on their section are recorded.

Here is a scatter plot that shows the data for each volunteer.

scatter plot of beach cleaning

  1. Label the vertical axis of the scatter plot.
  2. If a volunteer has worked 45 minutes, should they have closer to 60 meters or 120 meters of beach left to clean? Explain your reasoning.
Show Solution
  1. Sample response: beach left to clean (meters)
  2. 60 meters. Sample reasoning: When the time spent cleaning increases, the amount of beach left to clean tends to decrease. To keep in line with the rest of the data, the length left to clean should be closer to 60 meters than 120 meters.
Lesson 2
Patterns of Growth

Here are two tables representing two different situations.

  • A student runs errands for a neighbor every week. The table shows the pay he may receive, in dollars, in any given week.
    number of errands pay in dollars difference from previous week factor from previous week
    0 10 - -
    1 15 5 1.5
    2 20 5 1.33
    3 25 5 1.25
    4 30 5 1.2
  • A student at a high school heard a rumor that a celebrity will be speaking at graduation. The table shows how the rumor is spreading over time, in days.
    day people who have
    heard the rumor
    difference from previous day factor from previous
    day
    0 1 - -
    1 5 4 5
    2 25 20 5
    3 125 100 5
    4 625 500 5

Once we recognize how these patterns change, we can describe them mathematically. This allows us to understand their behavior, extend the patterns, and make predictions.

Notice that in the situation with the student running errands, the difference is constant from week to week, while the factor changes. In the situation about a rumor spreading, the difference changes from day to day, but the factor is constant. This can give us clues to how we might write out the pattern in each situation.

Meow Island and Purr Island (1 problem)

The tables show the cat population on two islands over several years. Describe mathematically, as precisely as you can, how the cat population on each island is changing.

year 0 1 2 3 4
number of cats on Meow Island 2 6 18 54 162
year 0 1 2 3 4
number of cats on Purr Island 2 6 10 14 18
Show Solution

Sample responses:

The cat population on Meow Island is:

  • Tripling each year
  • Growing by a common factor each year

The cat population on Purr Island is:

  • Adding 4 cats each year
  • Characterized by a common difference each year
Lesson 3
What a Point in a Scatter Plot Means

Scatter plots show two measurements for each individual from a group. For example, this scatter plot shows the weight and height for each dog from a group of 25 dogs. 

Scatterplot.
A scatterplot. Horizontal, from 6 to 30, by 3’s, labeled dog height, inches. Vertical, from 0 to 112, by 16’s, labeled dog weight, pounds. 24 data points.  Trend upward and to right.

We can see that the tallest dogs are 27 inches, and that one of those tallest dogs weighs about 75 pounds while the other weighs about 110 pounds. This shows us that dog weight is not a function of dog height because there would be two different outputs for the same input. But we can see a general trend: taller dogs tend to weigh more than shorter dogs. There are exceptions. For example, there is a dog that is 18 inches tall and weighs over 50 pounds, and there is another dog that is 21 inches tall but weighs less than 30 pounds.

When we collect data by measuring attributes like height, weight, area, or volume, we call the data numerical data (or measurement data), and we say that height, weight, area, or volume is a numerical variable.

Quarterbacks (1 problem)

In football, a quarterback can be rated by a formula that assigns a number to how well they play.
A higher number generally means they played better.

Here are a table and scatter plot that show ratings and wins for quarterbacks who started every game in a season.

player quarterback rating number of wins
A 93.8 4
B 102.2 12
C 93.6 6
D 89 8
E 88.2 5
F 97 7
G 88.7 6
H 91.1 7
I 92.7 10
J 88 10
K 101.6 9
L 104.6 13
M 84.2 6
N 99.4 15
O 110.1 10
P 95.4 11
Q 88.7 11

A scatterplot.
A scatterplot. The horizontal axis is labeled “quarterback rating” and the numbers 80 through 120, in increments of 10, are indicated. The vertical axis is labeled “number of wins” and the numbers 0 through 20, in increments of 5, are indicated. The data are as follows: 84 point 2 comma 6. 88 comma 10. 88 point 2 comma 5. 88 point 7 comma 6. 88 point 7 comma 11. 89 comma 8. 91 point 1 comma 7. 92 point 7 comma 10. 93 point 6 comma 6. 93 point 8 comma 4. 95 point 4 comma 11. 97 comma 7. 99 point 4 comma 15. 101 point 6 comma 9. 102 point 2 comma 12. 104 point 6 comma 13. 110 point 1 comma 10.

  1. Circle the point in the scatter plot that represents Player K’s data.
  2. Which quarterback’s data are represented by the point farthest to the left?
  3. Player R is not included in the table. He has a quarterback rating of 99.4 and his team won 8 games. On the scatter plot, plot a point that represents Player R’s data.
Show Solution

<p>Scatterplot.</p>

  1. The circled point on the scatter plot
  2. Player M
  3. The added point to the scatter plot, plotted larger for visibility
Section A Check
Section A Checkpoint
Lesson 4
Representing Exponential Decay

Here is a graph showing the luminescence of a glow-in-the-dark paint, measured in lumens, over a period of time, measured in hours. The luminescence of this glow-in-the-dark paint can be modeled by an exponential function.

A graph comparing luminescence (lumens) over time (hours) with 7 data points including $(0,12)$, $(1,6)$, $(2,3)$, and $(3,1.5)$.

Notice that the amounts are decreasing over time. The graph includes the point (0,12)(0, 12). This means that when the glow-in-the-dark paint started glowing, its glow measured 12 lumens. The point (1,6)(1, 6) tells us the glow measured 6 lumens 1 hour later. Between 3 and 4 hours after the glow-in-the-dark paint began to glow, the luminescence fell below 1 lumen.

We can use the graph to find out what fraction of luminescence stays each hour. Notice that 612=12\frac{6}{12}=\frac{1}{2} and 36=12\frac{3}{6}=\frac{1}{2}. As each hour passes, the luminescence that stays is multiplied by a factor of 12\frac{1}{2}.

If yy is the luminescence, in lumens, and tt is time, in hours, then this situation is modeled by the equation:

y=12(12)ty=12 \boldcdot (\frac{1}{2})^t

We can confirm that the data is changing exponentially because it is multiplied by the same value each time. When the growth factor is between 0 and 1, the quantity being multiplied decreases, the situation is sometimes called “exponential decay,” and the growth factor may be called a “decay factor.”

Freezing Soup (1 problem)

A soup is placed in a freezer to save. Here is a graph showing the temperature of the soup at different times after being placed in the freezer.

Graph of points. Horizontal axis time in hours.  Vertical axis, temperature in degrees Celsius.

  1. What is the vertical intercept? What does it mean in this situation?
  2. What fraction of the temperature remained after one hour?
  3. Write an equation that represents the temperature of the soup, tt, after hh hours.
Show Solution
  1. 60. The soup is 60C60^\circ \text{C} when it is placed in the freezer.
  2. 610\frac{6}{10} of the temperature remained one hour after the soup was placed in the freezer.
  3. t=60(610)ht=60(\frac{6}{10})^h
Lesson 5
Describing Trends in Scatter Plots

When a linear function fits data well, we say there is a “linear association” between the variables. For example, the relationship between height and weight for 25 dogs with the linear function whose graph is shown in the scatter plot.

We say there is a positive association between dog height and dog weight because knowledge about one variable helps predict the other variable, and when one variable increases, the other tends to increase as well.

A scatterplot, horizontal, dog height in inches, 6 to 30 by 3, vertical, 0 to 112 by 16. Same scatterplot as previous, this time with a line through 9 comma 0 and 27 comma 80.

What do you think the association between the weight of a car and its fuel efficiency is?

We say that there is a negative association between fuel efficiency and weight of a car because knowledge about one variable helps predict the other variable, and when one variable increases, the other tends to decrease.

Scatterplot, weight, kilograms, 1000 to 2500 by 250, fuel efficiency, miles per gallon, 14 to 32 by 2. Points are arranged close to the line through 1100 comma 28 down and right through 2300 comma 14.

This Is One Way to Do It (1 problem)
  1. Elena said, “I think this line is a good fit because half of the points are on one side of the line and half of the points are on the other side.” Do you agree? Explain your reasoning.

    Scatterplot.
    A scatterplot. Horizontal, from 0 to 12, by 2’s. Vertical, from 0 to 80, by 20’s. Data trends downward and to right. Line of best fit drawn, goes slightly upward and to right. 10 data points above and below line.

  2. Noah said, “I think this line is a good fit because it passes through the leftmost point and the rightmost point.” Do you agree? Explain your reasoning.

    Scatterplot.
    A scatterplot. Horizontal, from 0 to 12, by 2’s. Vertical, from 0 to 80, by 20’s. Data trends downward and to right. Line of best fit drawn, goes downward and to right. 14 data points below line, 4 points above line, and two points on line.

Show Solution
  1. Disagree. Sample response: The line is not a good fit because the data show a negative association, but the line has a positive slope.
  2. Disagree. Sample responses: The line is not a good fit because most of the points are below it and the trend of the scatter plot is steeper than the slope of the graph.
Lesson 6
Analyzing Graphs

Graphs are useful for comparing relationships. Here are two graphs representing the amount of caffeine in Person A and Person B, in milligrams, at different times, measured hourly, after an initial measurement.

A

<p>Graph of an exponential function, origin O. time (hours) and caffeine (mg).</p>
Graph of an exponential function, origin O. Horizontal axis, time (hours), scale 0 to 10, by 1’s Vertical axis, caffeine (mg), scale 0 to 200, by 100’s. The function is discrete and has these approximate points: (0 comma 200), (1 comma 160), (2 comma 125), (3 comma 105), (4 comma 80), (5 comma 65), (6 comma 50), (7 comma 45), (8 comma 35), (9 comma 25), (10 comma 20), (11 comma 18) and (12 comma 12).

B

<p>Graph of an exponential function, origin O. time (hours) and caffeine (mg).</p>
Graph of an exponential function, origin O. Horizontal axis, time (hours), scale 0 to 10, by 1’s. Vertical axis, caffeine (mg), scale 0 to 200, by 100’s. The function is discrete and has these approximate points: (0 comma 100), (1 comma 90), (2 comma 80), (3 comma 70 ), (4 comma 65), (5 comma 60), (6 comma 57), (7 comma 50), (8 comma 45), (9 comma 40), (10 comma 38), (11 comma 35) and (12 comma 30).

The graphs reveal interesting information about the caffeine in each person over time:

  • At the initial measurement, Person A has more caffeine (200 milligrams) than Person B (100 milligrams).
  • The caffeine in Person A's body decreases faster. It went from 200 to 160 milligrams in an hour. Because 160 is 810\frac{8}{10} (or 45\frac45) of 200, the growth factor is 45\frac45.
  • The caffeine in Person B's body went from 100 to about 90 milligrams, so that growth factor is about 910\frac{9}{10}. This means that after each hour, a larger fraction of caffeine stays in Person B than in Person A.
  • Even though Person A started out with twice as much caffeine, because of the growth factor, Person A had less caffeine than Person B after 6 hours.
A Phone, a Company, a Camera (1 problem)

<p>Graph of function on grid.</p>
Graph of a function on grid, origin O. Horizontal axis, time, years, from 0 to 14 by 0 point 5's. Vertical axis, value, dollars, from 0 to 1,400 by 100's. Approximate plotted coordinates as follows:  0 comma 1,200, 1 comma 720, 2 comma 430, 3 comma 260, 4 comma 155, 5 comma 90, 6 comma 55, 7 comma 33, 8 comma 20, 9 comma 12, 10 comma 7, 11 comma 4, 12 comma 2, 13 comma 1.  

  1. This graph represents one of the following descriptions. Which one?
    1. A phone loses 45\frac{4}{5} of its value every year after purchase: the relationship​ between​ ​the​ ​number​ ​of​ ​years​ ​since purchasing​ ​the​ ​phone​ ​and​ ​the​ ​value​ ​of​ ​the​ ​phone.
    2. The number of stores that a company has triples approximately every 5 years: the relationship ​between​ ​the number of years and the​ ​number​ ​of​ ​stores​.
    3. A camera loses 25\frac{2}{5} of its value every year after purchase: the relationship​ between​ ​the​ ​number​ ​of​ ​years​ ​since purchasing​ ​the​ ​camera​ ​and​ ​the​ ​value​ ​of​ ​the​ ​camera.
  2. Explain how you know the graph represents the description you chose.
Show Solution
  1. C
  2. Sample responses:
    • The graph cannot represent Description A because the phone is retaining only 15\frac15 of its value, which is less than half, and the vertical coordinate of the second point on the graph is more than half of the vertical intercept.
    • If the camera loses 25\frac{2}{5} of its value each year, then its value is 35\frac{3}{5} that of the previous year. The vertical intercept seems to be 1,200, and 35\frac35 of 1,200 is about 700, which is roughly the vertical coordinate of the second point.
Lesson 7
Using Negative Exponents

Equations are useful not only for representing relationships that change exponentially, but also for answering questions about these situations.

Suppose a bacteria population of 1,000,000 has been increasing by a factor of 2 every hour. What was the size of the population 5 hours ago? How many hours ago was the population less than 1,000?

We could go backward and calculate the population of bacteria 1 hour ago, 2 hours ago, and so on. For example, if the population doubled each hour and was 1,000,000 when first observed, an hour before then it must have been 500,000, and two hours before then it must have been 250,000, and so on.

Another way to reason through these questions is by representing the situation with an equation. If tt measures time in hours since the population was 1,000,000, then the bacteria population can be described by the equation:

p=1,000,0002t\displaystyle p = 1,000,000 \boldcdot 2^t

The population is 1,000,000 when tt is 0, so 5 hours earlier, tt would be -5 and here is a way to calculate the population:

1,000,0002-5 =1,000,000125=1,000,000132=31,250\displaystyle \begin{aligned} 1,000,000 \boldcdot 2^{\text-5} &= 1,000,000 \boldcdot \frac{1}{2^5} \\ &= 1,000,000 \boldcdot \frac{1}{32} \\ &= 31,250 \end{aligned}

Likewise, substituting -10 for tt gives us 1,000,0002-101,000,000 \boldcdot 2^{\text-10} (or 1,000,00012101,000,000 \boldcdot \frac{1}{2^{10}}), which is a little less than 1,000. This means that 10 hours before the initial measurement the bacteria population was less than 1,000.

Invasive Fish (1 problem)

The equation p=5,0002tp =5,000 \boldcdot 2^t represents the population of an invasive fish species in a large lake, tt years since 2005, when the fish population in the lake was first surveyed.

  1. What was the population in 2005?
  2. For this model, what does it mean when tt is -2?
  3. For t=-2t = \text-2, is the fish population more or less than 1,000? How do you know?
Show Solution
  1. 5,000, because 5,00020=5,0005,000 \boldcdot2^0 =5,000.
  2. It means 2 years before 2005, which is 2003.
  3. More than 1,000. If t=-2t=\text-2, then 5,0002-2=1,2505,000 \boldcdot 2^{\text-2}=1,250.
Lesson 8
Analyzing Bivariate Data

People often collect data in two variables to investigate possible associations between two numerical variables and use the connections that they find to predict more values of the variables.

Data analysis usually follows these steps:

  1. Collect data
  2. Organize and represent the data, then look for an association
  3. Identify any outliers and try to explain why these data points are exceptions to the trend that describes the association
  4. Find a function that fits the data well

Although computational systems can help with data analysis by graphing the data, finding a function that might fit the data, and using that function to make predictions, it is important to understand the process and think about what is happening. A computational system may find a function that does not make sense or use a line when the situation suggests that a different model would be more appropriate.

Drawing a Line (1 problem)
  1. Draw a line on the scatter plot that fits the data well.

    A scatterplot.
    Scatterplot. Horizontal from 0 to 10, by 2’s. Vertical from negative 0 to 10, by 2’s. 8 dots clustered in upper left side of graph spread horizontally between 0 and 2 point 5 and vertically from 7 to about 8. 11 dots spread horizontally between 6 and 10 and vertically from 3 to 6.

  2. A new point will be added to the scatter plot with x=4x = 4. What do you predict for the yy-value of this point if it follows the association of the data already in the scatter plot?
  3. A new point will be added to the scatter plot with x=10x = 10. What is an example of a yy-value of this point if it is considered an outlier?
Show Solution

Sample responses:

  1.  

    <p>Scatterplot with line of best fit.</p>

  2. 6
  3. 10
Section B Check
Section B Checkpoint
Lesson 9
Looking for Associations

When we collect data by counting things in various categories, like red, blue, or yellow, we call the data “categorical data,” and we say that color is a “categorical variable.”

We can use two-way tables to investigate possible connections between two categorical variables.

For example, this two-way table of frequencies shows the results of a study of meditation and state of mind of athletes before a track meet.

meditated did not meditate total
calm 45 8 53
agitated 23 21 44
total 68 29 97

If we are interested in the question of whether there is an association between meditating and being calm, we might present the frequencies in a bar graph, grouping data about those who meditated and those who did not meditate, so we can compare the numbers of calm and agitated athletes in each group.

Double bar graph.
Double bar graph in blue and red. Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 50, by 10's. Blue represents calm. Red represents agitated.

Notice that the number of athletes who did not meditate is small compared to the number who meditated (29 as compared to 68, as shown in the table).

If we want to know the proportions of calm meditators and calm non-meditators, we can make a two-way table of relative frequencies and present the relative frequencies in a segmented bar graph.

meditated did not meditate
calm 66% 28%
agitated 34% 72%
total 100% 100%

Stacked bar graph.
Stacked bar graph in blue and red.  Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 100, by 25's. Blue represents calm. Red represents agitated.

Guitar and Golf (1 problem)

  1. In a class of 25 students, some students play a sport, some play a musical instrument, some do both, and some do neither. Complete the two-way table to show the data from the bar graph.

    plays an instrument does not play an instrument total
    plays a sport 16
    does not play a sport 5
    total 25
  2. Using the entries from the actual frequency table, complete this table so that it shows relative frequencies based on the rows. Round entries to the nearest percentage point.

    plays an instrument does not play an instrument total
    plays a sport 89% 100%
    does not play a sport 71% 100%
Show Solution

Sample response:

  1. plays an instrument does not play an instrument total
    plays a sport 2 16 18
    does not play a sport 5 2 7
    total 7 18

    25

  2. plays an instrument does not play an instrument total
    plays a sport 11%, since 2÷180.112 \div 18 \approx 0.11 89%, since 16÷180.8916 \div 18 \approx 0.89 100%
    does not play a sport 71%, since 5÷70.715 \div 7 \approx 0.71 29%, since 2÷70.292 \div 7 \approx 0.29 100%
Lesson 10
Using Data Displays to Find Associations

In an earlier lesson, we looked at data on meditation and state of mind in athletes.

Double bar graph.
Double bar graph in blue and red. Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 50, by 10's. Blue represents calm. Red represents agitated.

Stacked bar graph.
Stacked bar graph in blue and red.  Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 100, by 25's. Blue represents calm. Red represents agitated.

Is there an association between meditation and state of mind?

The bar graph shows that more athletes were calm than agitated among the group that meditated, and more athletes were agitated than calm among the group that did not.

We can see the proportions of calm meditators and calm non-meditators from the segmented bar graph, which shows that about 66% of athletes who meditated were calm, whereas only about 27% of those who did not meditate were calm.

This does not necessarily mean that meditation causes calmness. It could be the other way around, where calm athletes are more inclined to meditate. However, it does suggest that there is an association between meditating and calmness.

Class Preferences (1 problem)

Here are a two-way table and segmented bar graph for data from students in 2 classes.

Do they show evidence of differences between the 2 classes?

prefers math prefers science prefers recess
class A 6 3 8
class B 8 7 15

Stacked bar graph in three colors.
Stacked bar graph in yellow, red and blue. Horizontal labeled Class A and Class B. Vertical labeled 0 to 100, by 25's. Yellow represents recess. Red represents science. Blue represents math.

Show Solution

There is no evidence of different preferences associated with each class because the segments in the bars are about the same size.

Section C Check
Section C Checkpoint
Unit 6 Assessment
End-of-Unit Assessment