Unit 3 Two Variable Statistics — Unit Plan

TitleAssessment
Lesson 1
Two-Way Tables
Oh, Deer

Forest rangers record information about some of the deer in the forest they oversee. Use the two-way table to answer the questions about the deer they observed. 

  younger than 1 year old 1 year old or older
antlers 0 12
no antlers 23 15
  1. How many of the observed deer younger than 1 year old have antlers?
  2. How many of the observed deer are 1 year old or older?
  3. How many different deer are included in the table?
Show Solution
  1. 0
  2. 27
  3. 50
Lesson 2
Relative Frequency Tables
Writing Choices

Eighty students are asked if they prefer manual or electric pencil sharpeners and if they prefer mechanical or wood pencils.

mechanical pencils wood pencils
manual sharpeners 5 10
electric sharpeners 34 31
  1. Complete the relative frequency table with the correct proportions so that it could be used to answer the following question: “Among students who like manual pencil sharpeners, what proportion also prefer mechanical pencils?” 
     

    mechanical pencils wood pencils
    manual sharpeners
    electric sharpeners
  2. Use the table to determine the percentage of people who prefer electric sharpeners and wood pencils.
Show Solution
  1. mechanical pencils wood pencils
    manual sharpeners 0.33 0.67
    electric sharpeners 0.52 0.48
  2. 48%

Lesson 3
Associations in Categorical Data
Graduate Debt

The table summarizes data about the median debt for a sample of students graduating from universities in California and New York. 

median debt less than $9,000 median debt at least $9,000 total
California universities 130 445 575
New York universities 72 271 343
total 202 716 918

Is there an association between the state and the amount of median debt for graduates? Explain your reasoning.

Show Solution

Sample response: There is not enough evidence to support a claim of association between state universities and median debt. Of California universities, 77% (4455750.77\frac{445}{575} \approx 0.77) have students who graduate with a median debt of at least $9,000, which is very similar to the 79% (2713430.79\frac{271}{343} \approx 0.79) of New York universities that also have a large debt.

Section A Check
Section A Checkpoint
Problem 1

A company is testing two versions of a product with a group of people to find out whether they would buy the product. The results are summarized in the table.

product version 1 product version 2 total
would buy it 23 32 55
neutral or would not buy it 12 17 29
total 35 49 84
  1. If each person’s response is only in the table once, how many people reviewed product version 2?
  2. Use a relative frequency table to determine if there is an association between the product versions and whether people would buy each one. Explain your reasoning.
Show Solution
  1. 49
  2. Sample response:
    product version 1 product version 2
    would buy it 66% 65%
    neutral or would not buy it 34% 35%
    total 100% 100%

    There is no association between the variables. The relative frequencies for each column are very similar and do not indicate that one version is more likely to be purchased than the other.

Lesson 4
Linear Models
Roar of the Crowd

The scatter plot shows the maximum noise level when different numbers of people are in a stadium. The linear model is given by the equation y=1.5x+22.7y = 1.5x + 22.7, where yy represents maximum noise level and xx represents the number of people, in thousands, in the stadium.

y=1.5x+22.7y = 1.5x+22.7

<p>Scatter plot.</p>
A scatterplot. Horizontal, from 60 to 80, by 5's, labeled number of people, thousands. Vertical, 105 to 140, by 5’s, labeled maximum noise level, decibels. 12 dots, straight line trending upward and to the right.  
​​​​​​

  1. The slope of the linear model is 1.5. What does this mean in terms of the maximum noise level and the number of people?
  2. A sports announcer states that there are 65,000 fans in the stadium. Estimate the maximum noise level. Is this estimate reasonable? Explain your reasoning.
  3. What is the yy-intercept of the linear model given? What does it mean in the context of the problem? Is this reasonable? Explain your reasoning.
Show Solution
  1. Sample response: For every additional thousand people in the stadium, the noise level increases by about 1.5 decibels.
  2. 120.2 decibels. Sample reasoning: It is a reasonable value since the data seem to fit a linear model well.
  3. The yy-intercept is (0,22.7)(0,22.7), which means a stadium with no people in it will have a maximum noise level of 22.7 decibels. Sample reasonings:
    • This is actually reasonable since a whisper is about 20 decibels.
    • This is not reasonable since it should be silent with no people in the stadium.
    • This is not reasonable because the point is so far from the data that it is unlikely that the linear model will be accurate.
Lesson 5
Fitting Lines
Fresh Air
  1. Which of these scatter plots shows data that would best be modeled with a linear function? Explain your reasoning.
    1.  

      <p>Scatterplot.</p>
      A scatterplot. Horizontal, from 0 to 10, by 1’s, labeled height, millimeters. Vertical, from 0 to 22, by 1’s, labeled weight, milligrams. 26 dots, approximate locations as follows: 0 point 8 comma 12, 0 point 5 comma 9, 3 point 5 comma 17 point 5, 2 comma 8 point 2, 3 point 6 comma 3 point 6, 2 point 6 comma 10, 3 point 6 comma 10 point 4, 3 point 7 comma 11 point 7, 1 point 8 comma 1, 4 point 5 comma 14 point 9, 4 point 6 comma 17 point 1, 4 point 2 comma 14 point 4, 5 comma 3 point 6, 2 point 5 comma 17 point 55, 5 point 2 comma 16 point 28, 5 point 6 comma 11 point 13, 6 point 3 comma 6 point 9, 6 comma 3 point 1, 8 point 8 comma 11 point 45, 6 point 3 comma 7 point 68, 7 point 5 comma 10 point 92, 8 point 1 comma 6 poiunt 7, 7 comma 15 point 77, 9 point 5 comma 12 point 75, 9 comma 15 point 66, 7 point 2 comma 9 point 375.  
      ​​​​​​

    2.  

      <p>Scatter plot.</p>
      A scatterplot. Horizontal, from 0 to 10, by 1’s, labeled temperature, degrees Celsius. Vertical, from 0 to 22, by 1’s, labeled number of phytoplankton, tens of thousands. 26 dots, approximate locations as follows: 0 comma 2 point 5, 0 point 5 comma 2 point 1, 1 point 5 comma 3 point 1, 2 comma 2 point 8, 2 point 6 comma 3 point 7, 2 point 6 comma 3 point 2, 3 point 4 comma 3 point 3, 3 point 4 comma 3 point 4, 4 comma 3 point 2, 4 point 5 comma 3 point 5, 4 point 6 comma 3 point 6, 4 point 7 comma 3 point 3, 5 comma 3 point 5, 5 point 5 comma 3 point 8, 5 pint 2 comma 3 point 5, 5 point 6 comma 4, 6 point 3 comma 6 point 3, 6 comma 4 point 5, 6 pint 8 comma 5 point 7, 6 point 9 comma 5, 7 point 5 comma 12 point 4, 8 comma 13 point 1, 7 comma 8 point 5, 8 point 5 comma 15 point 5, 9 comma 19 point 3, 9 point 6 comma 21 point 1.  
      ​​​​​​

    3.  

      <p>Scatter plot.</p>
      A scatterplot. Horizontal, from 0 to 10, by 1’s, labeled precipitation, centimeters. Vertical, from 0 to 22, by 1’s, labeled number of water used for irrigation, thousands of gallons. 25 dots, approximate locations as follows: 0 point 6 comma 10 point 6, 0 comma 10 point 1, 0 point 5 comma 9 point 75, 1 point 5 comma 8 point 7, 2 point 1 comma 8 point 5, 2 point 2 comma 8 point 25, 2 point 2 comma 6 point 25, 2 point 6 comma 7 point 5,  3 comma 7 point 5,  3 comma 8 point 1, 3 point 1 comma 6 point 1, 3 point 5 comma 6 point 9,  4 comma 6 point 75,  5 comma 5 point 6, 5 point 5 comma 6 point 25, 5 point 6 comma 5 point 8, 5 point 7 comma 6 point 15,  6 point 4 comma 4 point 7, 6 point 4 comma 4 point 3, 6 point 6 comma 3 point 8, 6 point 6 comma 4 point 15, 7 comma 3 point 4.  
      ​​​​​​

  2. Which of the lines is most likely the line of best fit for the data provided?
    number of trees in a forest tons of oxygen produced by the forest
    148 16.43
    175 25.64
    190 23.28
    200 29.2
    202 21.41
    425 60.56
    505 50.75
    528 74.45
    562 62.66
    585 84.24

    1. y=-0.51x+225.12y = \text{-}0.51x + 225.12
    2. y=0.34x34.05y = 0.34x-34.05
    3. y=0.13x0.19y=0.13x-0.19
    4. y=0.98x21.13y = 0.98x - 21.13
Show Solution
  1. C. Sample reasoning: The points on the scatter plot seem to follow the shape of a line the best. The scatter plot in A is very spread out, and the scatter plot in B is probably better fit by a curve.
  2. y=0.13x0.19y = 0.13x-0.19
Lesson 6
Residuals
Deciding from Residuals

Each of these graphs of residuals is from the same set of data using different lines to fit the data. Which graph is most likely to represent the residuals from the best-fit line? Explain your reasoning.

A

<p>Graph of residuals, origin O. </p>
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 0), (1 comma point 5), (1 point 5 comma 0), (2 comma negative point 25), ( 2 comma negative point 5), (3 comma point 2), (3 point 5 comma point 25).  

B

<p>Graph of residuals, origin O.</p>
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 0), (1 comma negative point 5), (1 point 5 comma negative point 5), (2 comma negative 1), ( 2 comma negative 1 point 5), (3 comma negative point 5), (3 point 5 comma negative point 5).

C

<p>Graph of residuals, origin O. </p>
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 0), (1 comma negative point 5), (1 point 5 comma 0), (2 comma negative point 25), ( 3 comma 1 point 5), (3 point 5 comma 2).  

D

<p>Graph of residuals, origin O. </p>
Graph of residuals, origin O. Horizontal axis scale negative 1 to 4, by 1’s Vertical axis scale negative 3 to 3, by 1’s. The points are discrete and have these approximate points: (1 comma 1), (1 comma point 5), (1 point 5 comma 0), (2 comma negative 1), ( 2 comma negative 1 point 3), (3 comma negative 1 point 6), (3 point 5 comma negative 2 point 3).

Show Solution

Graph A is most likely to represent the residuals from the best-fit line since the residuals seem well spaced on both sides of the xx-axis without an obvious pattern and all of the residuals are close to zero.

Section B Check
Section B Checkpoint
Problem 1

A safety inspector records the speed of a car 30 times and how far the car travels until it comes to a complete stop. The results are summarized in the scatter plot.

Scatter plot showing stopping distance versus speed with a linear model graphed.

  1. Using the linear model y=3.1x1y = 3.1x - 1, what does the slope mean in this situation? Does the vertical intercept make sense in this situation? Explain your thinking.
  2. Use the residuals to justify that this linear model is a good fit for the data at slow speeds.

Scatter plot showing residuals

Show Solution
Sample response: 
  1. The slope means that for every extra mile per hour of speed, the car takes about 3.1 feet longer to stop. The intercept does not make sense. If a car is going 0 miles per hour, according to the model it would stop in -1 feet, which is not possible.
  2. For speeds up to around 12 miles per hour, the residuals are relatively close to the axis and scattered on both sides.
Lesson 7
The Correlation Coefficient
What Is a Correlation Coefficient?
  1. What information does a correlation coefficient tell us about the data in a scatter plot?
  2. Which value best estimates the value for the correlation coefficient of the scatter plot:
    -1, -0.8, -0.2, 0.2, 0.8, or 1? Explain your reasoning.

<p>Graph of a scatter plot, xy-plane, origin O.</p>
Graph of a scatter plot, xy-plane, origin O. Horizontal axis scale 0 to 14, by 2’s. Vertical axis scale 0 to 32, by 4’s. Best fit line from approximately (4 comma 30) to near (zero point 5 comma 14). The data is slightly scattered and trends downward with a negative slope.

Show Solution

Sample response:

  1. The sign of the correlation coefficient matches the sign of the slope of the best-fit line. The closer the correlation coefficient value is to 0, the worse the fit of the best-fit line. The closer the correlation coefficient is to 1 or -1, the better the best-fit line fits the data.
  2. -0.8, since the data appears to be decreasing and a line is an okay fit for the data, but not perfect
Lesson 8
Using the Correlation Coefficient
How Bad Is It, Doc?

Doctors suspect a strain of bacteria found in the hospital is becoming resistant to antibiotics. They put various amounts of an antibiotic in petri dishes and add some of the bacteria to allow it to grow. After some time, the doctors return to the petri dishes and measure the number of bacteria for the different amounts of the antibiotic.

The data are plotted with a best fit line. The correlation coefficient is r=-0.83r = \text{-}0.83.

  1. What does the sign of the correlation coefficient mean in this situation?
  2. What does the numerical value of the correlation coefficient mean in this situation?
  3. In a follow-up study, a group of scientists collect data that are fit by a linear model with a correlation coefficient of r=-0.94r = \text{-}0.94. Which study suggests a stronger relationship: the doctors’ study or the scientists’ study? Explain your reasoning.
Show Solution
  1. There is a negative relationship between the number of bacteria colonies and the concentration of the antibiotic in the dish. When the concentration of the antibiotic is higher, there are fewer bacteria colonies.
  2. The relationship between the number of bacteria colonies and the concentration of the antibiotic in the dish is strong since this value is fairly close to -1.
  3. The scientists’ study suggests a stronger relationship. Sample reasoning: Both correlation coefficients are negative, and the correlation coefficient for the scientists’ line is closer to -1.
Lesson 9
Causal Relationships
Just Cause

For each pair of variables, decide whether you think there is:

  • A very weak or no relationship.
  • A strong relationship that is not a causal relationship.
  • A causal relationship.

Explain your reasoning.

  1. number of snow plows owned by a city and mitten sales in the city
  2. number of text messages sent per day by a person and number of shirts owned by the person
  3. price of a set of crayons and size of the box holding the crayons
  4. amount of gas used on a trip and number of miles driven on the trip
Show Solution

Sample responses:

  1. A strong relationship that is not a causal relationship. The variables are related since cities with more snow plows will probably also have high sales of mittens compared to places with fewer snow plows. The climate, number of people living in the city, and amount of snow all affect both of these variables. A city having extra snow plows would not cause people to buy more mittens, nor would the reverse happen.
  2. A very weak or no relationship. These variables seem unrelated, and there is not another variable, like age or wealth, that seems to be related to both of these variables consistently.
  3. A strong relationship that is not a causal relationship. The variables are related because a more expensive set of crayons will generally have more crayons for more color options, which requires a larger box, but it is the number of crayons that is the cause for the increase in both variables.
  4. Causal relationship. Longer trips will cause greater gas consumption, and shorter trips will require less gas.
Section C Check
Section C Checkpoint
Problem 1
During the autumn, a Canadian city’s number of flu cases and sales of sandals are recorded. A linear model is fit with a correlation coefficient of r=-0.24r = \text-0.24.
  1. Classify the relationship as strong or weak and as positive or negative.
  2. Do you think the relationship is causal? Explain your reasoning.
Show Solution
  1. There is a weak, negative correlation.
  2. Sample response: The relationship is likely not causal. As the temperature gets cooler, people usually buy fewer sandals and spend more time indoors with one another, spreading flu germs. The temperature is most likely the cause of both of these variables changing rather than one directly causing the other to change.
Lesson 10
Fossils and Flags
No cool-down