STAT 200 Week 7 Homework Problems
10.1.2
Table #10.1.6 contains the value of the house and the amount of rental income in a year that the house brings in ("Capital and rental," 2013). Create a scatter plot and find a regression equation between house value and rental income. Then use the regression equation to find the rental income a house worth $230,000 and for a house worth $400,000. Which rental income that you calculated do you think is closer to the true rental income? Why?
Table #10.1.6: Data of House Value versus Rental
Value |
Rental |
Value |
Rental |
Value |
Rental |
Value |
Rental |
81000 |
6656 |
77000 |
4576 |
75000 |
7280 |
67500 |
6864 |
95000 |
7904 |
94000 |
8736 |
90000 |
6240 |
85000 |
7072 |
121000 |
12064 |
115000 |
7904 |
110000 |
7072 |
104000 |
7904 |
135000 |
8320 |
130000 |
9776 |
126000 |
6240 |
125000 |
7904 |
145000 |
8320 |
140000 |
9568 |
140000 |
9152 |
135000 |
7488 |
165000 |
13312 |
165000 |
8528 |
155000 |
7488 |
148000 |
8320 |
178000 |
11856 |
174000 |
10400 |
170000 |
9568 |
170000 |
12688 |
200000 |
12272 |
200000 |
10608 |
194000 |
11232 |
190000 |
8320 |
214000 |
8528 |
208000 |
10400 |
200000 |
10400 |
200000 |
8320 |
240000 |
10192 |
240000 |
12064 |
240000 |
11648 |
225000 |
12480 |
289000 |
11648 |
270000 |
12896 |
262000 |
10192 |
244500 |
11232 |
325000 |
12480 |
310000 |
12480 |
303000 |
12272 |
300000 |
12480 |
Scatterplot with regression equation:
Hence line of best fit can be given by:
Y = 0.0244x + 5363.9
Now if we take x = $230,000
Predicted Rental = 0.0244*230000 + 5363.9 = $10,975.90
Again if we take x = $400,000
Predicted Rental = 0.0244*400000 + 5363.9 = $15123.9
It looks like rental income calculated for $230,000 will be much closer to the actual rental. The reason may be that, $230,000 is within the range of original data but $400,000 is outside the scope of original data.
10.1.4
The World Bank collected data on the percentage of GDP that a country spends on health expenditures ("Health expenditure," 2013) and also the percentage of women receiving prenatal care("Pregnant woman receiving," 2013). The data for the countries where this information are available for the year 2011 is in table #10.1.8. Create a scatter plot of the data and find a regression equation between percentage spent on health expenditure and the percentage of women receiving prenatal care. Then use the regression equation to find the percent of women receiving prenatal care for a country that spends 5.0% of GDP on health expenditure and for a country that spends 12.0% of GDP. Which prenatal care percentage that you calculated do you think is closer to the true percentage? Why?
Table #10.1.8: Data of Health Expenditure versus Prenatal Care
Health Expenditure (% of GDP) |
Prenatal Care (%) |
9.6 |
47.9 |
3.7 |
54.6 |
5.2 |
93.7 |
5.2 |
84.7 |
10.0 |
100.0 |
4.7 |
42.5 |
4.8 |
96.4 |
6.0 |
77.1 |
5.4 |
58.3 |
4.8 |
95.4 |
4.1 |
78.0 |
6.0 |
93.3 |
9.5 |
93.3 |
6.8 |
93.7 |
6.1 |
89.8 |
Scatterplot along with line of best fit:
The line of best fit is:
Y = 1.6606x + 69.739
Now if we take health expenditure x = 5%,
Predicted Prenatal care = 1.6606 * 5 + 69.739 = 78.042%
Again, if health expenditure x = 12%,
Predicted Prenatal care = 1.6606 * 12 + 69.739 = 89.67%
Itlooks like the prenatal care for 5% health expenditure is much closer to the actual value as 5% is within the range of original data whereas 12% is outside the scope of original data.
10.2.2
Table #10.1.6 contains the value of the house and the amount of rental income in a year that the house brings in ("Capital and rental," 2013). Find the correlation coefficient and coefficient of determination and then interpret both.
Table #10.1.6: Data of House Value versus Rental
Value |
Rental |
Value |
Rental |
Value |
Rental |
Value |
Rental |
81000 |
6656 |
77000 |
4576 |
75000 |
7280 |
67500 |
6864 |
95000 |
7904 |
94000 |
8736 |
90000 |
6240 |
85000 |
7072 |
121000 |
12064 |
115000 |
7904 |
110000 |
7072 |
104000 |
7904 |
135000 |
8320 |
130000 |
9776 |
126000 |
6240 |
125000 |
7904 |
145000 |
8320 |
140000 |
9568 |
140000 |
9152 |
135000 |
7488 |
165000 |
13312 |
165000 |
8528 |
155000 |
7488 |
148000 |
8320 |
178000 |
11856 |
174000 |
10400 |
170000 |
9568 |
170000 |
12688 |
200000 |
12272 |
200000 |
10608 |
194000 |
11232 |
190000 |
8320 |
214000 |
8528 |
208000 |
10400 |
200000 |
10400 |
200000 |
8320 |
240000 |
10192 |
240000 |
12064 |
240000 |
11648 |
225000 |
12480 |
289000 |
11648 |
270000 |
12896 |
262000 |
10192 |
244500 |
11232 |
325000 |
12480 |
310000 |
12480 |
303000 |
12272 |
300000 |
12480 |
I used excel data analysis tool to calculate the correlation coefficient between house value and rental.
According to the above output, Pearson correlation coefficient r = 0.764716
It looks like a strong positive linear relation exist between house value and rental.
Coefficient of determination r2 = 0.76472 = 0.585
So, 58.5% variation in rental can be explained by the variation in house value. This percentage is good enough to use the obtained regression equation to predict the rental income with help of value of houses.
10.2.4
The World Bank collected data on the percentage of GDP that a country spends on health expenditures ("Health expenditure," 2013) and also the percentage of women receiving prenatal care("Pregnant woman receiving," 2013). The data for the countries where this information is available for the year 2011 are in table #10.1.8. Find the correlation coefficient and coefficient of determination and then interpret both.
Table #10.1.8: Data of Health Expenditure versus Prenatal Care
Health Expenditure (% of GDP) |
Prenatal Care (%) |
9.6 |
47.9 |
3.7 |
54.6 |
5.2 |
93.7 |
5.2 |
84.7 |
10.0 |
100.0 |
4.7 |
42.5 |
4.8 |
96.4 |
6.0 |
77.1 |
5.4 |
58.3 |
4.8 |
95.4 |
4.1 |
78.0 |
6.0 |
93.3 |
9.5 |
93.3 |
6.8 |
93.7 |
6.1 |
89.8 |
I used excel data analysis tool to calculate the correlation coefficient between health expenditure and parental care.
According to the above output, Pearson correlation coefficient r = 0.1715
It looks like a weak positive linear relation exist between health expenditure and parental care.
Coefficient of determination r2 = 0.17152 = 0.0294
So, only 2.94% variation in parental care can be explained by the health expenditure. So as the proportion is very low, it is not good to use the regression equation to predict parental care with help of health expenditure.
10.3.2
Table #10.1.6 contains the value of the house and the amount of rental income in a year that the house brings in ("Capital and rental," 2013).
Test at the 5% level for a positive correlation between house value and rental amount.
Table #10.1.6: Data of House Value versus Rental
Value |
Rental |
Value |
Rental |
Value |
Rental |
Value |
Rental |
81000 |
6656 |
77000 |
4576 |
75000 |
7280 |
67500 |
6864 |
95000 |
7904 |
94000 |
8736 |
90000 |
6240 |
85000 |
7072 |
121000 |
12064 |
115000 |
7904 |
110000 |
7072 |
104000 |
7904 |
135000 |
8320 |
130000 |
9776 |
126000 |
6240 |
125000 |
7904 |
145000 |
8320 |
140000 |
9568 |
140000 |
9152 |
135000 |
7488 |
165000 |
13312 |
165000 |
8528 |
155000 |
7488 |
148000 |
8320 |
178000 |
11856 |
174000 |
10400 |
170000 |
9568 |
170000 |
12688 |
200000 |
12272 |
200000 |
10608 |
194000 |
11232 |
190000 |
8320 |
214000 |
8528 |
208000 |
10400 |
200000 |
10400 |
200000 |
8320 |
240000 |
10192 |
240000 |
12064 |
240000 |
11648 |
225000 |
12480 |
289000 |
11648 |
270000 |
12896 |
262000 |
10192 |
244500 |
11232 |
325000 |
12480 |
310000 |
12480 |
303000 |
12272 |
300000 |
12480 |
Hypothesis for correlation is positive:
Null Hypothesis: H0: ? = 0
Alternate Hypothesis: H1: ?> 0 (claim)
Level of significance? = 0.05
Excel output for regression analysis:
SUMMARY OUTPUT |
||||||||
Regression Statistics |
||||||||
Multiple R |
0.764716 |
|||||||
R Square |
0.58479 |
|||||||
Adjusted R Square |
0.575764 |
|||||||
Standard Error |
1441.625 |
|||||||
Observations |
48 |
|||||||
ANOVA |
||||||||
|
df |
SS |
MS |
F |
Significance F |
|||
Regression |
1 |
1.35E+08 |
1.35E+08 |
64.78736 |
2.5E-10 |
|||
Residual |
46 |
95600982 |
2078282 |
|||||
Total |
47 |
2.3E+08 |
|
|
|
|||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
5363.865 |
567.2408 |
9.456062 |
2.34E-12 |
4222.068 |
6505.661 |
4222.068 |
6505.661 |
Value |
0.024358 |
0.003026 |
8.04906 |
2.5E-10 |
0.018267 |
0.03045 |
0.018267 |
0.03045 |
T test = 8.04906
P = 0 (approximately)
P < 0>
10.3.4
The World Bank collected data on the percentage of GDP that a country spends on health expenditures ("Health expenditure," 2013) and also the percentage of women receiving prenatal care("Pregnant woman receiving," 2013). The data for the countries where this information is available for the year 2011 are in table #10.1.8.
Test at the 5% level for a correlation between percentage spent on health expenditure and the percentage of women receiving prenatal care.
Table #10.1.8: Data of Health Expenditure versus Prenatal Care
Health Expenditure (% of GDP) |
Prenatal Care (%) |
9.6 |
47.9 |
3.7 |
54.6 |
5.2 |
93.7 |
5.2 |
84.7 |
10.0 |
100.0 |
4.7 |
42.5 |
4.8 |
96.4 |
6.0 |
77.1 |
5.4 |
58.3 |
4.8 |
95.4 |
4.1 |
78.0 |
6.0 |
93.3 |
9.5 |
93.3 |
6.8 |
93.7 |
6.1 |
89.8 |
Null Hypothesis: H0: ? = 0
Alternate Hypothesis: H1: ?? 0 (claim)
Level of significance = 0.05
Excel output for regression analysis:
SUMMARY OUTPUT |
||||||||
Regression Statistics |
||||||||
Multiple R |
0.171505 |
|||||||
R Square |
0.029414 |
|||||||
Adjusted R Square |
-0.04525 |
|||||||
Standard Error |
19.92675 |
|||||||
Observations |
15 |
|||||||
ANOVA |
||||||||
|
df |
SS |
MS |
F |
Significance F |
|||
Regression |
1 |
156.4362 |
156.4362 |
0.393971 |
0.541089 |
|||
Residual |
13 |
5161.981 |
397.0755 |
|||||
Total |
14 |
5318.417 |
|
|
|
|||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
69.7394 |
17.00601 |
4.100869 |
0.001251 |
33.00015 |
106.4786 |
33.00015 |
106.4786 |
Health Expenditure (% of GDP) |
1.660599 |
2.645652 |
0.627671 |
0.541089 |
-4.05498 |
7.376182 |
-4.05498 |
7.376182 |
According to above output T test = 0.6277
P = 0.5411
As P >0.05 (level of significance), we fail to reject the null hypothesis. So, insufficient evidence to support the claim that there is a significant correlation between percentage spent on health expenditure and the percentage of women receiving prenatal care.
11.1.2
Researchers watched groups of dolphins off the coast of Ireland in 1998 to determine what activities the dolphins partake in at certain times of the day ("Activities of dolphin," 2013). The numbers in table #11.1.6 represent the number of groups of dolphins that were partaking in an activity at certain times of days. Is there enough evidence to show that the activity and the time period are independent for dolphins? Test at the 1% level.
Table #11.1.6: Dolphin Activity
Activity |
Period |
Row Total |
|||
Morning |
Noon |
Afternoon |
Evening |
||
Travel |
6 |
6 |
14 |
13 |
39 |
Feed |
28 |
4 |
0 |
56 |
88 |
Social |
38 |
5 |
9 |
10 |
62 |
Column Total |
72 |
15 |
23 |
79 |
189 |
Null Hypothesis: the activity and the time period are independent of each other for dolphins.
Alternate Hypothesis: the activity and the time period are dependent on each other for dolphins.
Level of significance ?= 0.01
Expected value table |
|||||
|
Period |
Row |
|||
Activity |
Morning |
Noon |
Afternoon |
Evening |
Total |
Travel |
14.857 |
3.095 |
4.746 |
16.302 |
39 |
Feed |
33.524 |
6.984 |
10.709 |
36.783 |
88 |
Social |
23.619 |
4.921 |
7.545 |
25.915 |
62 |
Column Total |
72 |
15 |
23 |
79 |
189 |
Chi-Square test:
Statistic |
DF |
Value |
P-value |
Chi-square |
6 |
68.464567 |
<0> |
Chi square = 68.465
P = 0
As P <0>
Hence, sufficient evidence to support the claim that the activity and the time period are dependent on each other for dolphins.
11.1.4
A person’s educational attainment and age group was collected by the U.S. Census Bureau in 1984 to see if age group and educational attainment are related. The counts in thousands are in table #11.1.8 ("Education by age," 2013). Do the data show that educational attainment and age are independent? Test at the 5% level.
Table #11.1.8: Educational Attainment and Age Group
Education |
Age Group |
Row Total |
||||
25-34 |
35-44 |
45-54 |
55-64 |
>64 |
||
Did not complete HS |
5416 |
5030 |
5777 |
7606 |
13746 |
37575 |
Competed HS |
16431 |
1855 |
9435 |
8795 |
7558 |
44074 |
College 1-3 years |
8555 |
5576 |
3124 |
2524 |
2503 |
22282 |
College 4 or more years |
9771 |
7596 |
3904 |
3109 |
2483 |
26863 |
Column Total |
40173 |
20057 |
22240 |
22034 |
26290 |
130794 |
Null Hypothesis: educational attainment and age are independent of each other
Alternate Hypothesis: educational attainment and age are dependent on each other
Level of significance ?= 0.05
Expected value |
||||||
|
Age Group |
Row Total |
||||
Education |
25-34 |
35-44 |
45-54 |
55-64 |
>64 |
|
Did not complete HS |
11541.05 |
5762.05 |
6389.19 |
6330.01 |
7552.69 |
37575 |
Competed HS |
13537.20 |
6758.66 |
7494.27 |
7424.86 |
8859.01 |
44074 |
College 1-3 years |
6843.85 |
3416.90 |
3788.80 |
3753.70 |
4478.75 |
22282 |
College 4 or more years |
8250.89 |
4119.39 |
4567.74 |
4525.43 |
5399.55 |
26863 |
Column Total |
40173 |
20057 |
22240 |
22034 |
26290 |
130794 |
Chi-Square test:
Statistic |
DF |
Value |
P-value |
Chi-square |
12 |
9.9325602e-16 |
1 |
P value = 1
As P > 0.05, we fail to reject the null hypothesis. So, insufficient evidence to support the claim that educational attainment and age are dependent on each other.
11.2.4
In Africa in 2011, the number of deaths of a female from cardiovascular disease for different age groups are in table #11.2.6 ("Global health observatory," 2013). In addition, the proportion of deaths of females from all causes for the same age groups are also in table #11.2.6. Do the data show that the death from cardiovascular disease are in the same proportion as all deaths for the different age groups? Test at the 5% level.
Table #11.2.6: Deaths of Females for Different Age Groups
Age |
5-14 |
15-29 |
30-49 |
50-69 |
Total |
Cardiovascular Frequency |
8 |
16 |
56 |
433 |
513 |
All Cause Proportion |
0.10 |
0.12 |
0.26 |
0.52 |
|
Null Hypothesis: The death from cardiovascular disease are in similar proportion as all deaths for the different age groups
Alternate Hypothesis: the death from cardiovascular disease are not in similar proportion as all deaths for the different age groups
Level of significance = 0.05
Age |
Observed Frequency (O) |
Proportion |
Expected Frequency (E) |
(O-E)^2/E |
5 - 14 |
8 |
0.1 |
51.3 |
36.55 |
15 - 29 |
16 |
0.12 |
61.56 |
33.72 |
30 - 49 |
56 |
0.26 |
133.38 |
44.89 |
50 - 69 |
433 |
0.52 |
266.76 |
103.60 |
total |
513 |
|
total |
218.76 |
Chi square = 218.76
P = 0
As P < level>
11.2.6
A project conducted by the Australian Federal Office of Road Safety asked people many questions about their cars. One question was the reason that a person chooses a given car, and that data is in table #11.2.8 ("Car preferences," 2013).
Table #11.2.8: Reason for Choosing a Car
Safety |
Reliability |
Cost |
Performance |
Comfort |
Looks |
84 |
62 |
46 |
34 |
47 |
27 |
Do the data show that the frequencies observed substantiate the claim that the reasons for choosing a car are equally likely? Test at the 5% level.
Null Hypothesis: Reason of choosing car is equally likely for all reasons
Alternate Hypothesis: Reason of choosing car is not equally likely for all reasons
Level of significance = 0.05
Reason |
Observed Frequency (O) |
Expected Frequency (E) |
(O-E)^2/E |
Safety |
84 |
50 |
23.12 |
Reliability |
62 |
50 |
2.88 |
cost |
46 |
50 |
0.32 |
Performance |
34 |
50 |
5.12 |
Comfort |
47 |
50 |
0.18 |
Looks |
27 |
50 |
10.58 |
Total |
300 |
Sum |
42.2 |
Chi square = sum of(O-E)^2/E =42.2
DF = 6 – 1 = 5
P = 0
As P < level>