Consider the data collected from pulling back a toy car and then letting it go forward. In the first table, the data may not seem to have an obvious pattern. The second table has the same data and shows that both values are increasing together.
Unorganized table:
distance pulled back (in)
distance traveled (in)
6
23.57
4
18.48
10
38.66
8
31.12
2
13.86
1
8.95
Organized table:
distance pulled back (in)
distance traveled (in)
1
8.95
2
13.86
4
18.48
6
23.57
8
31.12
10
38.66
A scatter plot of the data makes the pattern clear enough that we can estimate how far the car will travel when it is pulled back 5 in.
Patterns in data can sometimes become more obvious when reorganized in a table or when represented in scatter plots or other diagrams. If a pattern is observed, it can sometimes be used to make predictions. This is a scatter plot for this scenario:
A scatterplot. Horizontal, from 0 to 12, by 2’s, labeled distance pulled back, inches. Vertical, from 0 to 40, by 10’s, labeled distance traveled, inches. 6 data points. Trend linearly upward and right.
Beach Cleaning (1 problem)
20 volunteers are cleaning the litter from a beach. The number of minutes each volunteer has worked and the number of meters left to clean on their section are recorded.
Here is a scatter plot that shows the data for each volunteer.
Label the vertical axis of the scatter plot.
If a volunteer has worked 45 minutes, should they have closer to 60 meters or 120 meters of beach left to clean? Explain your reasoning.
Show Solution
Sample response: beach left to clean (meters)
60 meters. Sample reasoning: When the time spent cleaning increases, the amount of beach left to clean tends to decrease. To keep in line with the rest of the data, the length left to clean should be closer to 60 meters than 120 meters.
Here are two tables representing two different situations.
A student runs errands for a neighbor every week. The table shows the pay he may receive, in dollars, in any given week.
number of errands
pay in dollars
difference from previous week
factor from previous week
0
10
-
-
1
15
5
1.5
2
20
5
1.33
3
25
5
1.25
4
30
5
1.2
A student at a high school heard a rumor that a celebrity will be speaking at graduation. The table shows how the rumor is spreading over time, in days.
day
people who have
heard the rumor
difference from previous day
factor from previous
day
0
1
-
-
1
5
4
5
2
25
20
5
3
125
100
5
4
625
500
5
Once we recognize how these patterns change, we can describe them mathematically. This allows us to understand their behavior, extend the patterns, and make predictions.
Notice that in the situation with the student running errands, the difference is constant from week to week, while the factor changes. In the situation about a rumor spreading, the difference changes from day to day, but the factor is constant. This can give us clues to how we might write out the pattern in each situation.
Meow Island and Purr Island (1 problem)
The tables show the cat population on two islands over several years. Describe mathematically, as precisely as you can, how the cat population on each island is changing.
Scatter plots show two measurements for each individual from a group. For example, this scatter plot shows the weight and height for each dog from a group of 25 dogs.
A scatterplot. Horizontal, from 6 to 30, by 3’s, labeled dog height, inches. Vertical, from 0 to 112, by 16’s, labeled dog weight, pounds. 24 data points. Trend upward and to right.
We can see that the tallest dogs are 27 inches, and that one of those tallest dogs weighs about 75 pounds while the other weighs about 110 pounds. This shows us that dog weight is not a function of dog height because there would be two different outputs for the same input. But we can see a general trend: taller dogs tend to weigh more than shorter dogs. There are exceptions. For example, there is a dog that is 18 inches tall and weighs over 50 pounds, and there is another dog that is 21 inches tall but weighs less than 30 pounds.
When we collect data by measuring attributes like height, weight, area, or volume, we call the data numerical data (or measurement data), and we say that height, weight, area, or volume is a numerical variable.
Quarterbacks (1 problem)
In football, a quarterback can be rated by a formula that assigns a number to how well they play.
A higher number generally means they played better.
Here are a table and scatter plot that show ratings and wins for quarterbacks who started every game in a season.
player
quarterback rating
number of wins
A
93.8
4
B
102.2
12
C
93.6
6
D
89
8
E
88.2
5
F
97
7
G
88.7
6
H
91.1
7
I
92.7
10
J
88
10
K
101.6
9
L
104.6
13
M
84.2
6
N
99.4
15
O
110.1
10
P
95.4
11
Q
88.7
11
A scatterplot. The horizontal axis is labeled “quarterback rating” and the numbers 80 through 120, in increments of 10, are indicated. The vertical axis is labeled “number of wins” and the numbers 0 through 20, in increments of 5, are indicated. The data are as follows: 84 point 2 comma 6. 88 comma 10. 88 point 2 comma 5. 88 point 7 comma 6. 88 point 7 comma 11. 89 comma 8. 91 point 1 comma 7. 92 point 7 comma 10. 93 point 6 comma 6. 93 point 8 comma 4. 95 point 4 comma 11. 97 comma 7. 99 point 4 comma 15. 101 point 6 comma 9. 102 point 2 comma 12. 104 point 6 comma 13. 110 point 1 comma 10.
Circle the point in the scatter plot that represents Player K’s data.
Which quarterback’s data are represented by the point farthest to the left?
Player R is not included in the table. He has a quarterback rating of 99.4 and his team won 8 games. On the scatter plot, plot a point that represents Player R’s data.
Show Solution
The circled point on the scatter plot
Player M
The added point to the scatter plot, plotted larger for visibility
Here is a graph showing the luminescence of a glow-in-the-dark paint, measured in lumens, over a period of time, measured in hours. The luminescence of this glow-in-the-dark paint can be modeled by an exponential function.
Notice that the amounts are decreasing over time. The graph includes the point (0,12). This means that when the glow-in-the-dark paint started glowing, its glow measured 12 lumens. The point (1,6) tells us the glow measured 6 lumens 1 hour later. Between 3 and 4 hours after the glow-in-the-dark paint began to glow, the luminescence fell below 1 lumen.
We can use the graph to find out what fraction of luminescence stays each hour. Notice that 126=21 and 63=21. As each hour passes, the luminescence that stays is multiplied by a factor of 21.
If y is the luminescence, in lumens, and t is time, in hours, then this situation is modeled by the equation:
y=12⋅(21)t
We can confirm that the data is changing exponentially because it is multiplied by the same value each time. When the growth factor is between 0 and 1, the quantity being multiplied decreases, the situation is sometimes called “exponential decay,” and the growth factor may be called a “decay factor.”
Freezing Soup (1 problem)
A soup is placed in a freezer to save. Here is a graph showing the temperature of the soup at different times after being placed in the freezer.
What is the vertical intercept? What does it mean in this situation?
What fraction of the temperature remained after one hour?
Write an equation that represents the temperature of the soup, t, after h hours.
Show Solution
60. The soup is 60∘C when it is placed in the freezer.
106 of the temperature remained one hour after the soup was placed in the freezer.
When a linear function fits data well, we say there is a “linear association” between the variables. For example, the relationship between height and weight for 25 dogs with the linear function whose graph is shown in the scatter plot.
We say there is a positive association between dog height and dog weight because knowledge about one variable helps predict the other variable, and when one variable increases, the other tends to increase as well.
What do you think the association between the weight of a car and its fuel efficiency is?
We say that there is a negative association between fuel efficiency and weight of a car because knowledge about one variable helps predict the other variable, and when one variable increases, the other tends to decrease.
This Is One Way to Do It (1 problem)
Elena said, “I think this line is a good fit because half of the points are on one side of the line and half of the points are on the other side.” Do you agree? Explain your reasoning.
A scatterplot. Horizontal, from 0 to 12, by 2’s. Vertical, from 0 to 80, by 20’s. Data trends downward and to right. Line of best fit drawn, goes slightly upward and to right. 10 data points above and below line.
Noah said, “I think this line is a good fit because it passes through the leftmost point and the rightmost point.” Do you agree? Explain your reasoning.
A scatterplot. Horizontal, from 0 to 12, by 2’s. Vertical, from 0 to 80, by 20’s. Data trends downward and to right. Line of best fit drawn, goes downward and to right. 14 data points below line, 4 points above line, and two points on line.
Show Solution
Disagree. Sample response: The line is not a good fit because the data show a negative association, but the line has a positive slope.
Disagree. Sample responses: The line is not a good fit because most of the points are below it and the trend of the scatter plot is steeper than the slope of the graph.
Graphs are useful for comparing relationships. Here are two graphs representing the amount of caffeine in Person A and Person B, in milligrams, at different times, measured hourly, after an initial measurement.
A
Graph of an exponential function, origin O. Horizontal axis, time (hours), scale 0 to 10, by 1’s Vertical axis, caffeine (mg), scale 0 to 200, by 100’s. The function is discrete and has these approximate points: (0 comma 200), (1 comma 160), (2 comma 125), (3 comma 105), (4 comma 80), (5 comma 65), (6 comma 50), (7 comma 45), (8 comma 35), (9 comma 25), (10 comma 20), (11 comma 18) and (12 comma 12).
B
Graph of an exponential function, origin O. Horizontal axis, time (hours), scale 0 to 10, by 1’s. Vertical axis, caffeine (mg), scale 0 to 200, by 100’s. The function is discrete and has these approximate points: (0 comma 100), (1 comma 90), (2 comma 80), (3 comma 70 ), (4 comma 65), (5 comma 60), (6 comma 57), (7 comma 50), (8 comma 45), (9 comma 40), (10 comma 38), (11 comma 35) and (12 comma 30).
The graphs reveal interesting information about the caffeine in each person over time:
At the initial measurement, Person A has more caffeine (200 milligrams) than Person B (100 milligrams).
The caffeine in Person A's body decreases faster. It went from 200 to 160 milligrams in an hour. Because 160 is 108 (or 54) of 200, the growth factor is 54.
The caffeine in Person B's body went from 100 to about 90 milligrams, so that growth factor is about 109. This means that after each hour, a larger fraction of caffeine stays in Person B than in Person A.
Even though Person A started out with twice as much caffeine, because of the growth factor, Person A had less caffeine than Person B after 6 hours.
A Phone, a Company, a Camera (1 problem)
Graph of a function on grid, origin O. Horizontal axis, time, years, from 0 to 14 by 0 point 5's. Vertical axis, value, dollars, from 0 to 1,400 by 100's. Approximate plotted coordinates as follows:
0 comma 1,200, 1 comma 720, 2 comma 430, 3 comma 260, 4 comma 155, 5 comma 90, 6 comma 55, 7 comma 33, 8 comma 20, 9 comma 12, 10 comma 7, 11 comma 4, 12 comma 2, 13 comma 1.
This graph represents one of the following descriptions. Which one?
A phone loses 54 of its value every year after purchase: the relationship between the number of years since purchasing the phone and the value of the phone.
The number of stores that a company has triples approximately every 5 years: the relationship between the number of years and the number of stores.
A camera loses 52 of its value every year after purchase: the relationship between the number of years since purchasing the camera and the value of the camera.
Explain how you know the graph represents the description you chose.
Show Solution
C
Sample responses:
The graph cannot represent Description A because the phone is retaining only 51 of its value, which is less than half, and the vertical coordinate of the second point on the graph is more than half of the vertical intercept.
If the camera loses 52 of its value each year, then its value is 53 that of the previous year. The vertical intercept seems to be 1,200, and 53 of 1,200 is about 700, which is roughly the vertical coordinate of the second point.
Equations are useful not only for representing relationships that change exponentially, but also for answering questions about these situations.
Suppose a bacteria population of 1,000,000 has been increasing by a factor of 2 every hour. What was the size of the population 5 hours ago? How many hours ago was the population less than 1,000?
We could go backward and calculate the population of bacteria 1 hour ago, 2 hours ago, and so on. For example, if the population doubled each hour and was 1,000,000 when first observed, an hour before then it must have been 500,000, and two hours before then it must have been 250,000, and so on.
Another way to reason through these questions is by representing the situation with an equation. If t measures time in hours since the population was 1,000,000, then the bacteria population can be described by the equation:
p=1,000,000⋅2t
The population is 1,000,000 when t is 0, so 5 hours earlier, t would be -5 and here is a way to calculate the population:
Likewise, substituting -10 for t gives us 1,000,000⋅2-10 (or 1,000,000⋅2101), which is a little less than 1,000. This means that 10 hours before the initial measurement the bacteria population was less than 1,000.
Invasive Fish (1 problem)
The equation p=5,000⋅2t represents the population of an invasive fish species in a large lake, t years since 2005, when the fish population in the lake was first surveyed.
What was the population in 2005?
For this model, what does it mean when t is -2?
For t=-2, is the fish population more or less than 1,000? How do you know?
People often collect data in two variables to investigate possible associations between two numerical variables and use the connections that they find to predict more values of the variables.
Data analysis usually follows these steps:
Collect data
Organize and represent the data, then look for an association
Identify any outliers and try to explain why these data points are exceptions to the trend that describes the association
Find a function that fits the data well
Although computational systems can help with data analysis by graphing the data, finding a function that might fit the data, and using that function to make predictions, it is important to understand the process and think about what is happening. A computational system may find a function that does not make sense or use a line when the situation suggests that a different model would be more appropriate.
Drawing a Line (1 problem)
Draw a line on the scatter plot that fits the data well.
Scatterplot. Horizontal from 0 to 10, by 2’s. Vertical from negative 0 to 10, by 2’s. 8 dots clustered in upper left side of graph spread horizontally between 0 and 2 point 5 and vertically from 7 to about 8. 11 dots spread horizontally between 6 and 10 and vertically from 3 to 6.
A new point will be added to the scatter plot with x=4. What do you predict for the y-value of this point if it follows the association of the data already in the scatter plot?
A new point will be added to the scatter plot with x=10. What is an example of a y-value of this point if it is considered an outlier?
When we collect data by counting things in various categories, like red, blue, or yellow, we call the data “categorical data,” and we say that color is a “categorical variable.”
We can use two-way tables to investigate possible connections between two categorical variables.
For example, this two-way table of frequencies shows the results of a study of meditation and state of mind of athletes before a track meet.
meditated
did not meditate
total
calm
45
8
53
agitated
23
21
44
total
68
29
97
If we are interested in the question of whether there is an association between meditating and being calm, we might present the frequencies in a bar graph, grouping data about those who meditated and those who did not meditate, so we can compare the numbers of calm and agitated athletes in each group.
Double bar graph in blue and red. Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 50, by 10's. Blue represents calm. Red represents agitated.
Notice that the number of athletes who did not meditate is small compared to the number who meditated (29 as compared to 68, as shown in the table).
If we want to know the proportions of calm meditators and calm non-meditators, we can make a two-way table of relative frequencies and present the relative frequencies in a segmented bar graph.
meditated
did not meditate
calm
66%
28%
agitated
34%
72%
total
100%
100%
Stacked bar graph in blue and red. Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 100, by 25's. Blue represents calm. Red represents agitated.
Guitar and Golf (1 problem)
In a class of 25 students, some students play a sport, some play a musical instrument, some do both, and some do neither. Complete the two-way table to show the data from the bar graph.
plays an instrument
does not play an instrument
total
plays a sport
16
does not play a sport
5
total
25
Using the entries from the actual frequency table, complete this table so that it shows relative frequencies based on the rows. Round entries to the nearest percentage point.
In an earlier lesson, we looked at data on meditation and state of mind in athletes.
Double bar graph in blue and red. Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 50, by 10's. Blue represents calm. Red represents agitated.
Stacked bar graph in blue and red. Horizontal labeled meditated and did not meditate. Vertical labeled 0 to 100, by 25's. Blue represents calm. Red represents agitated.
Is there an association between meditation and state of mind?
The bar graph shows that more athletes were calm than agitated among the group that meditated, and more athletes were agitated than calm among the group that did not.
We can see the proportions of calm meditators and calm non-meditators from the segmented bar graph, which shows that about 66% of athletes who meditated were calm, whereas only about 27% of those who did not meditate were calm.
This does not necessarily mean that meditation causes calmness. It could be the other way around, where calm athletes are more inclined to meditate. However, it does suggest that there is an association between meditating and calmness.
Class Preferences (1 problem)
Here are a two-way table and segmented bar graph for data from students in 2 classes.
Do they show evidence of differences between the 2 classes?
prefers math
prefers science
prefers recess
class A
6
3
8
class B
8
7
15
Stacked bar graph in yellow, red and blue. Horizontal labeled Class A and Class B. Vertical labeled 0 to 100, by 25's. Yellow represents recess. Red represents science. Blue represents math.
Show Solution
There is no evidence of different preferences associated with each class because the segments in the bars are about the same size.