Forest rangers record information about some of the deer in the forest they oversee. Use the two-way table to answer the questions about the deer they observed.
younger than 1 year old
1 year old or older
antlers
0
12
no antlers
23
15
How many of the observed deer younger than 1 year old have antlers?
How many of the observed deer are 1 year old or older?
How many different deer are included in the table?
Show Solution
0
27
50
Lesson 2
Relative Frequency Tables
Writing Choices
Eighty students are asked if they prefer manual or electric pencil sharpeners and if they prefer mechanical or wood pencils.
mechanical pencils
wood pencils
manual sharpeners
5
10
electric sharpeners
34
31
Complete the relative frequency table with the correct proportions so that it could be used to answer the following question: “Among students who like manual pencil sharpeners, what proportion also prefer mechanical pencils?”
mechanical pencils
wood pencils
manual sharpeners
electric sharpeners
Use the table to determine the percentage of people who prefer electric sharpeners and wood pencils.
Show Solution
mechanical pencils
wood pencils
manual sharpeners
0.33
0.67
electric sharpeners
0.52
0.48
48%
Lesson 3
Associations in Categorical Data
Graduate Debt
The table summarizes data about the median debt for a sample of students graduating from universities in California and New York.
median debt less than $9,000
median debt at least $9,000
total
California universities
130
445
575
New York universities
72
271
343
total
202
716
918
Is there an association between the state and the amount of median debt for graduates? Explain your reasoning.
Show Solution
Sample response: There is not enough evidence to support a claim of association between state universities and median debt. Of California universities, 77% (575445≈0.77) have students who graduate with a median debt of at least $9,000, which is very similar to the 79% (343271≈0.79) of New York universities that also have a large debt.
Section A Check
Section A Checkpoint
Problem 1
A company is testing two versions of a product with a group of people to find out whether they would buy the product. The results are summarized in the table.
product version 1
product version 2
total
would buy it
23
32
55
neutral or would not buy it
12
17
29
total
35
49
84
If each person’s response is only in the table once, how many people reviewed product version 2?
Use a relative frequency table to determine if there is an association between the product versions and whether people would buy each one. Explain your reasoning.
Show Solution
49
Sample response:
product version 1
product version 2
would buy it
66%
65%
neutral or would not buy it
34%
35%
total
100%
100%
There is no association between the variables. The relative frequencies for each column are very similar and do not indicate that one version is more likely to be purchased than the other.
Lesson 4
Linear Models
Roar of the Crowd
The scatter plot shows the maximum noise level when different numbers of people are in a stadium. The linear model is given by the equation y=1.5x+22.7, where y represents maximum noise level and x represents the number of people, in thousands, in the stadium.
y=1.5x+22.7
A scatterplot. Horizontal, from 60 to 80, by 5's, labeled number of people, thousands. Vertical, 105 to 140, by 5’s, labeled maximum noise level, decibels. 12 dots, straight line trending upward and to the right.
The slope of the linear model is 1.5. What does this mean in terms of the maximum noise level and the number of people?
A sports announcer states that there are 65,000 fans in the stadium. Estimate the maximum noise level. Is this estimate reasonable? Explain your reasoning.
What is the y-intercept of the linear model given? What does it mean in the context of the problem? Is this reasonable? Explain your reasoning.
Show Solution
Sample response: For every additional thousand people in the stadium, the noise level increases by about 1.5 decibels.
120.2 decibels. Sample reasoning: It is a reasonable value since the data seem to fit a linear model well.
The y-intercept is (0,22.7), which means a stadium with no people in it will have a maximum noise level of 22.7 decibels. Sample reasonings:
This is actually reasonable since a whisper is about 20 decibels.
This is not reasonable since it should be silent with no people in the stadium.
This is not reasonable because the point is so far from the data that it is unlikely that the linear model will be accurate.
Lesson 5
Fitting Lines
Fresh Air
Which of these scatter plots shows data that would best be modeled with a linear function? Explain your reasoning.
A scatterplot. Horizontal, from 0 to 10, by 1’s, labeled height, millimeters. Vertical, from 0 to 22, by 1’s, labeled weight, milligrams. 26 dots, approximate locations as follows:
0 point 8 comma 12, 0 point 5 comma 9, 3 point 5 comma 17 point 5, 2 comma 8 point 2, 3 point 6 comma 3 point 6, 2 point 6 comma 10, 3 point 6 comma 10 point 4, 3 point 7 comma 11 point 7, 1 point 8 comma 1, 4 point 5 comma 14 point 9, 4 point 6 comma 17 point 1, 4 point 2 comma 14 point 4, 5 comma 3 point 6, 2 point 5 comma 17 point 55, 5 point 2 comma 16 point 28, 5 point 6 comma 11 point 13, 6 point 3 comma 6 point 9, 6 comma 3 point 1, 8 point 8 comma 11 point 45, 6 point 3 comma 7 point 68, 7 point 5 comma 10 point 92, 8 point 1 comma 6 poiunt 7, 7 comma 15 point 77, 9 point 5 comma 12 point 75, 9 comma 15 point 66, 7 point 2 comma 9 point 375.
A scatterplot. Horizontal, from 0 to 10, by 1’s, labeled temperature, degrees Celsius. Vertical, from 0 to 22, by 1’s, labeled number of phytoplankton, tens of thousands. 26 dots, approximate locations as follows:
0 comma 2 point 5, 0 point 5 comma 2 point 1, 1 point 5 comma 3 point 1, 2 comma 2 point 8, 2 point 6 comma 3 point 7, 2 point 6 comma 3 point 2, 3 point 4 comma 3 point 3, 3 point 4 comma 3 point 4, 4 comma 3 point 2, 4 point 5 comma 3 point 5, 4 point 6 comma 3 point 6, 4 point 7 comma 3 point 3, 5 comma 3 point 5, 5 point 5 comma 3 point 8, 5 pint 2 comma 3 point 5, 5 point 6 comma 4, 6 point 3 comma 6 point 3, 6 comma 4 point 5, 6 pint 8 comma 5 point 7, 6 point 9 comma 5, 7 point 5 comma 12 point 4, 8 comma 13 point 1, 7 comma 8 point 5, 8 point 5 comma 15 point 5, 9 comma 19 point 3, 9 point 6 comma 21 point 1.
A scatterplot. Horizontal, from 0 to 10, by 1’s, labeled precipitation, centimeters. Vertical, from 0 to 22, by 1’s, labeled number of water used for irrigation, thousands of gallons. 25 dots, approximate locations as follows:
0 point 6 comma 10 point 6, 0 comma 10 point 1, 0 point 5 comma 9 point 75, 1 point 5 comma 8 point 7, 2 point 1 comma 8 point 5, 2 point 2 comma 8 point 25, 2 point 2 comma 6 point 25, 2 point 6 comma 7 point 5,
3 comma 7 point 5,
3 comma 8 point 1,
3 point 1 comma 6 point 1,
3 point 5 comma 6 point 9,
4 comma 6 point 75,
5 comma 5 point 6,
5 point 5 comma 6 point 25,
5 point 6 comma 5 point 8,
5 point 7 comma 6 point 15,
6 point 4 comma 4 point 7,
6 point 4 comma 4 point 3,
6 point 6 comma 3 point 8,
6 point 6 comma 4 point 15,
7 comma 3 point 4.
Which of the lines is most likely the line of best fit for the data provided?
number of trees in a forest
tons of oxygen produced by the forest
148
16.43
175
25.64
190
23.28
200
29.2
202
21.41
425
60.56
505
50.75
528
74.45
562
62.66
585
84.24
y=-0.51x+225.12
y=0.34x−34.05
y=0.13x−0.19
y=0.98x−21.13
Show Solution
C. Sample reasoning: The points on the scatter plot seem to follow the shape of a line the best. The scatter plot in A is very spread out, and the scatter plot in B is probably better fit by a curve.
y=0.13x−0.19
Lesson 6
Residuals
Deciding from Residuals
Each of these graphs of residuals is from the same set of data using different lines to fit the data. Which graph is most likely to represent the residuals from the best-fit line? Explain your reasoning.
A
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 0), (1 comma point 5), (1 point 5 comma 0), (2 comma negative point 25), ( 2 comma negative point 5), (3 comma point 2), (3 point 5 comma point 25).
B
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 0), (1 comma negative point 5), (1 point 5 comma negative point 5), (2 comma negative 1), ( 2 comma negative 1 point 5), (3 comma negative point 5), (3 point 5 comma negative point 5).
C
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 0), (1 comma negative point 5), (1 point 5 comma 0), (2 comma negative point 25), ( 3 comma 1 point 5), (3 point 5 comma 2).
D
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 1), (1 comma point 5), (1 point 5 comma 0), (2 comma negative 1), ( 2 comma negative 1 point 3), (3 comma negative 1 point 6), (3 point 5 comma negative 2 point 3).
Show Solution
Graph A is most likely to represent the residuals from the best-fit line since the residuals seem well spaced on both sides of the x-axis without an obvious pattern and all of the residuals are close to zero.
Section B Check
Section B Checkpoint
Problem 1
A safety inspector records the speed of a car 30 times and how far the car travels until it comes to a complete stop. The results are summarized in the scatter plot.
Using the linear model y=3.1x−1, what does the slope mean in this situation? Does the vertical intercept make sense in this situation? Explain your thinking.
Use the residuals to justify that this linear model is a good fit for the data at slow speeds.
Show Solution
Sample response:
The slope means that for every extra mile per hour of speed, the car takes about 3.1 feet longer to stop. The intercept does not make sense. If a car is going 0 miles per hour, according to the model it would stop in -1 feet, which is not possible.
For speeds up to around 12 miles per hour, the residuals are relatively close to the axis and scattered on both sides.
Lesson 7
The Correlation Coefficient
What Is a Correlation Coefficient?
What information does a correlation coefficient tell us about the data in a scatter plot?
Which value best estimates the value for the correlation coefficient of the scatter plot:
-1, -0.8, -0.2, 0.2, 0.8, or 1? Explain your reasoning.
Graph of a scatter plot, xy-plane, origin O. Horizontal axis scale 0 to 14, by 2’s. Vertical axis scale 0 to 32, by 4’s. Best fit line from approximately (4 comma 30) to near (zero point 5 comma 14). The data is slightly scattered and trends downward with a negative slope.
Show Solution
Sample response:
The sign of the correlation coefficient matches the sign of the slope of the best-fit line. The closer the correlation coefficient value is to 0, the worse the fit of the best-fit line. The closer the correlation coefficient is to 1 or -1, the better the best-fit line fits the data.
-0.8, since the data appears to be decreasing and a line is an okay fit for the data, but not perfect
Lesson 8
Using the Correlation Coefficient
How Bad Is It, Doc?
Doctors suspect a strain of bacteria found in the hospital is becoming resistant to antibiotics. They put various amounts of an antibiotic in petri dishes and add some of the bacteria to allow it to grow. After some time, the doctors return to the petri dishes and measure the number of bacteria for the different amounts of the antibiotic.
The data are plotted with a best fit line. The correlation coefficient is r=-0.83.
What does the sign of the correlation coefficient mean in this situation?
What does the numerical value of the correlation coefficient mean in this situation?
In a follow-up study, a group of scientists collect data that are fit by a linear model with a correlation coefficient of r=-0.94. Which study suggests a stronger relationship: the doctors’ study or the scientists’ study? Explain your reasoning.
Show Solution
There is a negative relationship between the number of bacteria colonies and the concentration of the antibiotic in the dish. When the concentration of the antibiotic is higher, there are fewer bacteria colonies.
The relationship between the number of bacteria colonies and the concentration of the antibiotic in the dish is strong since this value is fairly close to -1.
The scientists’ study suggests a stronger relationship. Sample reasoning: Both correlation coefficients are negative, and the correlation coefficient for the scientists’ line is closer to -1.
Lesson 9
Causal Relationships
Just Cause
For each pair of variables, decide whether you think there is:
A very weak or no relationship.
A strong relationship that is not a causal relationship.
A causal relationship.
Explain your reasoning.
number of snow plows owned by a city and mitten sales in the city
number of text messages sent per day by a person and number of shirts owned by the person
price of a set of crayons and size of the box holding the crayons
amount of gas used on a trip and number of miles driven on the trip
Show Solution
Sample responses:
A strong relationship that is not a causal relationship. The variables are related since cities with more snow plows will probably also have high sales of mittens compared to places with fewer snow plows. The climate, number of people living in the city, and amount of snow all affect both of these variables. A city having extra snow plows would not cause people to buy more mittens, nor would the reverse happen.
A very weak or no relationship. These variables seem unrelated, and there is not another variable, like age or wealth, that seems to be related to both of these variables consistently.
A strong relationship that is not a causal relationship. The variables are related because a more expensive set of crayons will generally have more crayons for more color options, which requires a larger box, but it is the number of crayons that is the cause for the increase in both variables.
Causal relationship. Longer trips will cause greater gas consumption, and shorter trips will require less gas.
Section C Check
Section C Checkpoint
Problem 1
During the autumn, a Canadian city’s number of flu cases and sales of sandals are recorded. A linear model is fit with a correlation coefficient of r=-0.24.
Classify the relationship as strong or weak and as positive or negative.
Do you think the relationship is causal? Explain your reasoning.
Show Solution
There is a weak, negative correlation.
Sample response: The relationship is likely not causal. As the temperature gets cooler, people usually buy fewer sandals and spend more time indoors with one another, spreading flu germs. The temperature is most likely the cause of both of these variables changing rather than one directly causing the other to change.