STAT 200 Week 7 Homework Problems
10.1.2
Table #10.1.6 contains the value of the house and the amount of rental income in a year that the house brings in ("Capital and rental," 2013). Create a scatter plot and find a regression equation between house value and rental income. Then use the regression equation to find the rental income a house worth $230,000 and for a house worth $400,000. Which rental income that you calculated do you think is closer to the true rental income? Why?
Table #10.1.6: Data of House Value versus Rental
Value 
Rental 
Value 
Rental 
Value 
Rental 
Value 
Rental 
81000 
6656 
77000 
4576 
75000 
7280 
67500 
6864 
95000 
7904 
94000 
8736 
90000 
6240 
85000 
7072 
121000 
12064 
115000 
7904 
110000 
7072 
104000 
7904 
135000 
8320 
130000 
9776 
126000 
6240 
125000 
7904 
145000 
8320 
140000 
9568 
140000 
9152 
135000 
7488 
165000 
13312 
165000 
8528 
155000 
7488 
148000 
8320 
178000 
11856 
174000 
10400 
170000 
9568 
170000 
12688 
200000 
12272 
200000 
10608 
194000 
11232 
190000 
8320 
214000 
8528 
208000 
10400 
200000 
10400 
200000 
8320 
240000 
10192 
240000 
12064 
240000 
11648 
225000 
12480 
289000 
11648 
270000 
12896 
262000 
10192 
244500 
11232 
325000 
12480 
310000 
12480 
303000 
12272 
300000 
12480 
Scatterplot with regression equation:
Hence line of best fit can be given by:
Y = 0.0244x + 5363.9
Now if we take x = $230,000
Predicted Rental = 0.0244*230000 + 5363.9 = $10,975.90
Again if we take x = $400,000
Predicted Rental = 0.0244*400000 + 5363.9 = $15123.9
It looks like rental income calculated for $230,000 will be much closer to the actual rental. The reason may be that, $230,000 is within the range of original data but $400,000 is outside the scope of original data.
10.1.4
The World Bank collected data on the percentage of GDP that a country spends on health expenditures ("Health expenditure," 2013) and also the percentage of women receiving prenatal care("Pregnant woman receiving," 2013). The data for the countries where this information are available for the year 2011 is in table #10.1.8. Create a scatter plot of the data and find a regression equation between percentage spent on health expenditure and the percentage of women receiving prenatal care. Then use the regression equation to find the percent of women receiving prenatal care for a country that spends 5.0% of GDP on health expenditure and for a country that spends 12.0% of GDP. Which prenatal care percentage that you calculated do you think is closer to the true percentage? Why?
Table #10.1.8: Data of Health Expenditure versus Prenatal Care
Health Expenditure (% of GDP) 
Prenatal Care (%) 
9.6 
47.9 
3.7 
54.6 
5.2 
93.7 
5.2 
84.7 
10.0 
100.0 
4.7 
42.5 
4.8 
96.4 
6.0 
77.1 
5.4 
58.3 
4.8 
95.4 
4.1 
78.0 
6.0 
93.3 
9.5 
93.3 
6.8 
93.7 
6.1 
89.8 
Scatterplot along with line of best fit:
The line of best fit is:
Y = 1.6606x + 69.739
Now if we take health expenditure x = 5%,
Predicted Prenatal care = 1.6606 * 5 + 69.739 = 78.042%
Again, if health expenditure x = 12%,
Predicted Prenatal care = 1.6606 * 12 + 69.739 = 89.67%
Itlooks like the prenatal care for 5% health expenditure is much closer to the actual value as 5% is within the range of original data whereas 12% is outside the scope of original data.
10.2.2
Table #10.1.6 contains the value of the house and the amount of rental income in a year that the house brings in ("Capital and rental," 2013). Find the correlation coefficient and coefficient of determination and then interpret both.
Table #10.1.6: Data of House Value versus Rental
Value 
Rental 
Value 
Rental 
Value 
Rental 
Value 
Rental 
81000 
6656 
77000 
4576 
75000 
7280 
67500 
6864 
95000 
7904 
94000 
8736 
90000 
6240 
85000 
7072 
121000 
12064 
115000 
7904 
110000 
7072 
104000 
7904 
135000 
8320 
130000 
9776 
126000 
6240 
125000 
7904 
145000 
8320 
140000 
9568 
140000 
9152 
135000 
7488 
165000 
13312 
165000 
8528 
155000 
7488 
148000 
8320 
178000 
11856 
174000 
10400 
170000 
9568 
170000 
12688 
200000 
12272 
200000 
10608 
194000 
11232 
190000 
8320 
214000 
8528 
208000 
10400 
200000 
10400 
200000 
8320 
240000 
10192 
240000 
12064 
240000 
11648 
225000 
12480 
289000 
11648 
270000 
12896 
262000 
10192 
244500 
11232 
325000 
12480 
310000 
12480 
303000 
12272 
300000 
12480 
I used excel data analysis tool to calculate the correlation coefficient between house value and rental.
According to the above output, Pearson correlation coefficient r = 0.764716
It looks like a strong positive linear relation exist between house value and rental.
Coefficient of determination r^{2} = 0.7647^{2} = 0.585
So, 58.5% variation in rental can be explained by the variation in house value. This percentage is good enough to use the obtained regression equation to predict the rental income with help of value of houses.
10.2.4
The World Bank collected data on the percentage of GDP that a country spends on health expenditures ("Health expenditure," 2013) and also the percentage of women receiving prenatal care("Pregnant woman receiving," 2013). The data for the countries where this information is available for the year 2011 are in table #10.1.8. Find the correlation coefficient and coefficient of determination and then interpret both.
Table #10.1.8: Data of Health Expenditure versus Prenatal Care
Health Expenditure (% of GDP) 
Prenatal Care (%) 
9.6 
47.9 
3.7 
54.6 
5.2 
93.7 
5.2 
84.7 
10.0 
100.0 
4.7 
42.5 
4.8 
96.4 
6.0 
77.1 
5.4 
58.3 
4.8 
95.4 
4.1 
78.0 
6.0 
93.3 
9.5 
93.3 
6.8 
93.7 
6.1 
89.8 
I used excel data analysis tool to calculate the correlation coefficient between health expenditure and parental care.
According to the above output, Pearson correlation coefficient r = 0.1715
It looks like a weak positive linear relation exist between health expenditure and parental care.
Coefficient of determination r^{2} = 0.1715^{2} = 0.0294
So, only 2.94% variation in parental care can be explained by the health expenditure. So as the proportion is very low, it is not good to use the regression equation to predict parental care with help of health expenditure.
10.3.2
Table #10.1.6 contains the value of the house and the amount of rental income in a year that the house brings in ("Capital and rental," 2013).
Test at the 5% level for a positive correlation between house value and rental amount.
Table #10.1.6: Data of House Value versus Rental
Value 
Rental 
Value 
Rental 
Value 
Rental 
Value 
Rental 
81000 
6656 
77000 
4576 
75000 
7280 
67500 
6864 
95000 
7904 
94000 
8736 
90000 
6240 
85000 
7072 
121000 
12064 
115000 
7904 
110000 
7072 
104000 
7904 
135000 
8320 
130000 
9776 
126000 
6240 
125000 
7904 
145000 
8320 
140000 
9568 
140000 
9152 
135000 
7488 
165000 
13312 
165000 
8528 
155000 
7488 
148000 
8320 
178000 
11856 
174000 
10400 
170000 
9568 
170000 
12688 
200000 
12272 
200000 
10608 
194000 
11232 
190000 
8320 
214000 
8528 
208000 
10400 
200000 
10400 
200000 
8320 
240000 
10192 
240000 
12064 
240000 
11648 
225000 
12480 
289000 
11648 
270000 
12896 
262000 
10192 
244500 
11232 
325000 
12480 
310000 
12480 
303000 
12272 
300000 
12480 
Hypothesis for correlation is positive:
Null Hypothesis: H_{0}: ? = 0
Alternate Hypothesis: H_{1}: ?> 0 (claim)
Level of significance? = 0.05
Excel output for regression analysis:
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.764716 

R Square 
0.58479 

Adjusted R Square 
0.575764 

Standard Error 
1441.625 

Observations 
48 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
1.35E+08 
1.35E+08 
64.78736 
2.5E10 

Residual 
46 
95600982 
2078282 

Total 
47 
2.3E+08 





Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 
Intercept 
5363.865 
567.2408 
9.456062 
2.34E12 
4222.068 
6505.661 
4222.068 
6505.661 
Value 
0.024358 
0.003026 
8.04906 
2.5E10 
0.018267 
0.03045 
0.018267 
0.03045 
T test = 8.04906
P = 0 (approximately)
P < 0>
10.3.4
The World Bank collected data on the percentage of GDP that a country spends on health expenditures ("Health expenditure," 2013) and also the percentage of women receiving prenatal care("Pregnant woman receiving," 2013). The data for the countries where this information is available for the year 2011 are in table #10.1.8.
Test at the 5% level for a correlation between percentage spent on health expenditure and the percentage of women receiving prenatal care.
Table #10.1.8: Data of Health Expenditure versus Prenatal Care
Health Expenditure (% of GDP) 
Prenatal Care (%) 
9.6 
47.9 
3.7 
54.6 
5.2 
93.7 
5.2 
84.7 
10.0 
100.0 
4.7 
42.5 
4.8 
96.4 
6.0 
77.1 
5.4 
58.3 
4.8 
95.4 
4.1 
78.0 
6.0 
93.3 
9.5 
93.3 
6.8 
93.7 
6.1 
89.8 
Null Hypothesis: H_{0}: ? = 0
Alternate Hypothesis: H_{1}: ?? 0 (claim)
Level of significance = 0.05
Excel output for regression analysis:
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.171505 

R Square 
0.029414 

Adjusted R Square 
0.04525 

Standard Error 
19.92675 

Observations 
15 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
156.4362 
156.4362 
0.393971 
0.541089 

Residual 
13 
5161.981 
397.0755 

Total 
14 
5318.417 





Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 
Intercept 
69.7394 
17.00601 
4.100869 
0.001251 
33.00015 
106.4786 
33.00015 
106.4786 
Health Expenditure (% of GDP) 
1.660599 
2.645652 
0.627671 
0.541089 
4.05498 
7.376182 
4.05498 
7.376182 
According to above output T test = 0.6277
P = 0.5411
As P >0.05 (level of significance), we fail to reject the null hypothesis. So, insufficient evidence to support the claim that there is a significant correlation between percentage spent on health expenditure and the percentage of women receiving prenatal care.
11.1.2
Researchers watched groups of dolphins off the coast of Ireland in 1998 to determine what activities the dolphins partake in at certain times of the day ("Activities of dolphin," 2013). The numbers in table #11.1.6 represent the number of groups of dolphins that were partaking in an activity at certain times of days. Is there enough evidence to show that the activity and the time period are independent for dolphins? Test at the 1% level.
Table #11.1.6: Dolphin Activity
Activity 
Period 
Row Total 

Morning 
Noon 
Afternoon 
Evening 

Travel 
6 
6 
14 
13 
39 
Feed 
28 
4 
0 
56 
88 
Social 
38 
5 
9 
10 
62 
Column Total 
72 
15 
23 
79 
189 
Null Hypothesis: the activity and the time period are independent of each other for dolphins.
Alternate Hypothesis: the activity and the time period are dependent on each other for dolphins.
Level of significance ?= 0.01
Expected value table 


Period 
Row 

Activity 
Morning 
Noon 
Afternoon 
Evening 
Total 
Travel 
14.857 
3.095 
4.746 
16.302 
39 
Feed 
33.524 
6.984 
10.709 
36.783 
88 
Social 
23.619 
4.921 
7.545 
25.915 
62 
Column Total 
72 
15 
23 
79 
189 
ChiSquare test:
Statistic 
DF 
Value 
Pvalue 
Chisquare 
6 
68.464567 
<0> 
Chi square = 68.465
P = 0
As P <0>
Hence, sufficient evidence to support the claim that the activity and the time period are dependent on each other for dolphins.
11.1.4
A person’s educational attainment and age group was collected by the U.S. Census Bureau in 1984 to see if age group and educational attainment are related. The counts in thousands are in table #11.1.8 ("Education by age," 2013). Do the data show that educational attainment and age are independent? Test at the 5% level.
Table #11.1.8: Educational Attainment and Age Group
Education 
Age Group 
Row Total 

2534 
3544 
4554 
5564 
>64 

Did not complete HS 
5416 
5030 
5777 
7606 
13746 
37575 
Competed HS 
16431 
1855 
9435 
8795 
7558 
44074 
College 13 years 
8555 
5576 
3124 
2524 
2503 
22282 
College 4 or more years 
9771 
7596 
3904 
3109 
2483 
26863 
Column Total 
40173 
20057 
22240 
22034 
26290 
130794 
Null Hypothesis: educational attainment and age are independent of each other
Alternate Hypothesis: educational attainment and age are dependent on each other
Level of significance ?= 0.05
Expected value 


Age Group 
Row Total 

Education 
2534 
3544 
4554 
5564 
>64 

Did not complete HS 
11541.05 
5762.05 
6389.19 
6330.01 
7552.69 
37575 
Competed HS 
13537.20 
6758.66 
7494.27 
7424.86 
8859.01 
44074 
College 13 years 
6843.85 
3416.90 
3788.80 
3753.70 
4478.75 
22282 
College 4 or more years 
8250.89 
4119.39 
4567.74 
4525.43 
5399.55 
26863 
Column Total 
40173 
20057 
22240 
22034 
26290 
130794 
ChiSquare test:
Statistic 
DF 
Value 
Pvalue 
Chisquare 
12 
9.9325602e16 
1 
P value = 1
As P > 0.05, we fail to reject the null hypothesis. So, insufficient evidence to support the claim that educational attainment and age are dependent on each other.
11.2.4
In Africa in 2011, the number of deaths of a female from cardiovascular disease for different age groups are in table #11.2.6 ("Global health observatory," 2013). In addition, the proportion of deaths of females from all causes for the same age groups are also in table #11.2.6. Do the data show that the death from cardiovascular disease are in the same proportion as all deaths for the different age groups? Test at the 5% level.
Table #11.2.6: Deaths of Females for Different Age Groups
Age 
514 
1529 
3049 
5069 
Total 
Cardiovascular Frequency 
8 
16 
56 
433 
513 
All Cause Proportion 
0.10 
0.12 
0.26 
0.52 

Null Hypothesis: The death from cardiovascular disease are in similar proportion as all deaths for the different age groups
Alternate Hypothesis: the death from cardiovascular disease are not in similar proportion as all deaths for the different age groups
Level of significance = 0.05
Age 
Observed Frequency (O) 
Proportion 
Expected Frequency (E) 
(OE)^2/E 
5  14 
8 
0.1 
51.3 
36.55 
15  29 
16 
0.12 
61.56 
33.72 
30  49 
56 
0.26 
133.38 
44.89 
50  69 
433 
0.52 
266.76 
103.60 
total 
513 

total 
218.76 
Chi square = 218.76
P = 0
As P < level>
11.2.6
A project conducted by the Australian Federal Office of Road Safety asked people many questions about their cars. One question was the reason that a person chooses a given car, and that data is in table #11.2.8 ("Car preferences," 2013).
Table #11.2.8: Reason for Choosing a Car
Safety 
Reliability 
Cost 
Performance 
Comfort 
Looks 
84 
62 
46 
34 
47 
27 
Do the data show that the frequencies observed substantiate the claim that the reasons for choosing a car are equally likely? Test at the 5% level.
Null Hypothesis: Reason of choosing car is equally likely for all reasons
Alternate Hypothesis: Reason of choosing car is not equally likely for all reasons
Level of significance = 0.05
Reason 
Observed Frequency (O) 
Expected Frequency (E) 
(OE)^2/E 
Safety 
84 
50 
23.12 
Reliability 
62 
50 
2.88 
cost 
46 
50 
0.32 
Performance 
34 
50 
5.12 
Comfort 
47 
50 
0.18 
Looks 
27 
50 
10.58 
Total 
300 
Sum 
42.2 
Chi square = sum of(OE)^2/E =42.2
DF = 6 – 1 = 5
P = 0
As P < level>