STAT 200 Week 6 Homework Problems
9.1.2
Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam, 84,199 of them were female. In that same year, of the 211,693 students who took the Calculus AB exam 102,598 of them were female ("AP exam scores," 2013). Estimate the difference in the proportion of female students taking the biology exam and female students taking the calculus AB exam using a 90% confidence level.
Suppose that, P_{1} = proportion of female students in biology exam
P_{2} = proportion of female students in Calculus exam.
Now the sample proportion for both samples as:
p_{1} = 84199/144796 = 0.5815
p_{2 }= 102598/211693 = 0.4847
After that we need to calculate the pooled proportion p = (x1 + x2)/(n1 + n2)
Pooled proportion p = (84199 + 102598)/(144796 + 211693) = 0.524
Critical z for 90% confidence level = 1.645
E = critical z * standard error
Margin of error E = 1.645 * 0.524(10.524)(1144796+1211693)=0.0028
Hence 90% confidence interval
= ((0.5815 – 0.4847) – 0.0028, (0.5815 – 0.4847) + 0.0028)
= (0.094, 0.0996)
Hence, we can be 90% confident that the true difference in the proportion of female students taking the biology exam and female students taking the calculus AB exam will lie within (0.094, 0.0996)
9.1.5
Are there more children diagnosed with Autism Spectrum Disorder (ASD) in states that have larger urban areas over states that are mostly rural? In the state of Pennsylvania, a fairly urban state, there are 245 eightyearold diagnosed with ASD out of 18,440 eightyearold evaluated. In the state of Utah, a fairly rural state, there are 45 eightyearold diagnosed with ASD out of 2,123 eight years old evaluated ("Autism and developmental," 2008). Is there enough evidence to show that the proportion of children diagnosed with ASD in Pennsylvania is more than the proportion in Utah? Test at the 1% level.
Suppose that P_{1} is the proportion of children diagnosed for Pennsylvania and P_{2} is the proportion of children diagnosed for Utah.
Null hypothesis H_{0}: P_{1} = P_{2}
Alternate Hypothesis: H_{1}: P_{1} > P_{2} (claim)
Now given that, level of significance is 0.01
Critical z for right tailed test is 2.33.
Hence, rejection region is z > 2.33
Pooled proportion = (245 + 45)/ (18440 + 2123) = 0.0141
Standard error of proportion = 0.0141(10.0141)(118440+12123) = 0.0027
Calculation of z test,
Z test = 245184404521230.0027 = 2.93
Z test is less than 2.33. So, it is not in the rejection region. Hence, we fail to reject the null hypothesis. Hence, insufficient evidence that the proportion of children diagnosed with ASD in Pennsylvania is more than the proportion in Utah.
9.2.3
All Fresh Seafood is a wholesale fish company based on the east coast of the U.S. Catalina Offshore Products is a wholesale fish company based on the west coast of the U.S. Table #9.2.5 contains prices from both companies for specific fish types ("Seafood online," 2013) ("Buy sushi grade," 2013). Do the data provide enough evidence to show that a west coast fish wholesaler is more expensive than an east coast wholesaler? Test at the 5% level.
Table #9.2.5: Wholesale Prices of Fish in Dollars
Fish 
All Fresh Seafood Prices 
Catalina Offshore Products Prices 
Cod 
19.99 
17.99 
Tilapi 
6.00 
13.99 
Farmed Salmon 
19.99 
22.99 
Organic Salmon 
24.99 
24.99 
Grouper Fillet 
29.99 
19.99 
Tuna 
28.99 
31.99 
Swordfish 
23.99 
23.99 
Sea Bass 
32.99 
23.99 
Striped Bass 
29.99 
14.99 
Suppose that µ_{d} = Mean of paired differences for east cost – west cost
Null Hypothesis: H_{0}: µ_{d} = 0
Alternate Hypothesis: H_{1}: µ_{d} < 0>
Given that, level of significance is 0.05
Degree of freedom DF = 9 – 1 = 8
Critical t for the left tailed test is 1.86
Rejection region is t < 1.86.
I used a data analysis tool in excel to perform paired ttest:
tTest: Paired Two Sample for Means 


All Fresh Seafood Prices 
Catalina Offshore Products Prices 

Mean 
24.10222 
21.65667 

Variance 
66.81584 
31.25 

Observations 
9 
9 

Pearson Correlation 
0.473953 

Hypothesized Mean Difference 
0 

df 
8 

t Stat 
0.991517 

P(T<=t) onetail 
0.175236 

t Critical onetail 
1.859548 

P(T<=t) twotail 
0.350472 

t Critical twotail 
2.306004 

T test = 0.992
Here, ttest is not in the rejection region. So, we cannot reject the null hypothesis. So, insufficient evidence to support the claim that a west coast fish wholesaler is more expensive than an east coast wholesaler.
9.2.6
The British Department of Transportation studied to see if people avoid driving on Friday the 13^{th}. They did a traffic count on a Friday and then again on a Friday the 13^{th} at the same two locations ("Friday the 13th," 2013). The data for each location on the two different dates is in table #9.2.6. Estimate the mean difference in traffic count between the 6^{th} and the 13^{th} using a 90% level.
Table #9.2.6: Traffic Count
Dates 
6th 
13th 
1990, July 
139246 
138548 
1990, July 
134012 
132908 
1991, September 
137055 
136018 
1991, September 
133732 
131843 
1991, December 
123552 
121641 
1991, December 
121139 
118723 
1992, March 
128293 
125532 
1992, March 
124631 
120249 
1992, November 
124609 
122770 
1992, November 
117584 
117263 
Calculation of paired differences:
Dates 
6th 
13th 
d 
1990, July 
139246 
138548 
698 
1990, July 
134012 
132908 
1104 
1991, September 
137055 
136018 
1037 
1991, September 
133732 
131843 
1889 
1991, December 
123552 
121641 
1911 
1991, December 
121139 
118723 
2416 
1992, March 
128293 
125532 
2761 
1992, March 
124631 
120249 
4382 
1992, November 
124609 
122770 
1839 
1992, November 
117584 
117263 
321 
Now using excel calculator,
Mean of differences µ_{d} = 1835.8
Standard deviation s = 1176.014
Critical t for DF 9 and confidence level 90% = 1.833
So, margin of error for confidence interval calculation is E = 1.833*1176.014/SQRT(10) = 681.67
90% confidence interval = (1835.8 – 681.67, 1835.8 + 681.67) = (1154.13, 2517.47)
We, can be 90% confident that true mean difference in traffic count between the 6^{th} and the 13^{th} will lie within (1154.13, 2517.47)
9.3.1
The income of males in each state of the United States, including the District of Columbia and Puerto Rico, are given in table #9.3.3, and the income of females is given in table #9.3.4 ("Median income of," 2013). Is there enough evidence to show that the mean income of males is more than of females? Test at the 1% level.
Table #9.3.3: Data of Income for Males
$42,951 
$52,379 
$42,544 
$37,488 
$49,281 
$50,987 
$60,705 
$50,411 
$66,760 
$40,951 
$43,902 
$45,494 
$41,528 
$50,746 
$45,183 
$43,624 
$43,993 
$41,612 
$46,313 
$43,944 
$56,708 
$60,264 
$50,053 
$50,580 
$40,202 
$43,146 
$41,635 
$42,182 
$41,803 
$53,033 
$60,568 
$41,037 
$50,388 
$41,950 
$44,660 
$46,176 
$41,420 
$45,976 
$47,956 
$22,529 
$48,842 
$41,464 
$40,285 
$41,309 
$43,160 
$47,573 
$44,057 
$52,805 
$53,046 
$42,125 
$46,214 
$51,630 




Table #9.3.4: Data of Income for Females
$31,862 
$40,550 
$36,048 
$30,752 
$41,817 
$40,236 
$47,476 
$40,500 
$60,332 
$33,823 
$35,438 
$37,242 
$31,238 
$39,150 
$34,023 
$33,745 
$33,269 
$32,684 
$31,844 
$34,599 
$48,748 
$46,185 
$36,931 
$40,416 
$29,548 
$33,865 
$31,067 
$33,424 
$35,484 
$41,021 
$47,155 
$32,316 
$42,113 
$33,459 
$32,462 
$35,746 
$31,274 
$36,027 
$37,089 
$22,117 
$41,412 
$31,330 
$31,329 
$33,184 
$35,301 
$32,843 
$38,177 
$40,969 
$40,993 
$29,688 
$35,890 
$34,381 




Suppose that mean income for male is µ_{1} and mean income for female is µ_{2}
Null Hypothesis: H_{0}: µ_{1} = µ_{2}
Alternate Hypothesis: H_{1}: µ_{1} > µ_{2} (claim)
Given that Level of significance ? = 0.01
I used excel to perform independent sample t test.
tTest: TwoSample Assuming Unequal Variances 


Males 
Females 
Mean 
46446.38 
36511 
Variance 
49473354 
37676539 
Observations 
52 
52 
Hypothesized Mean Difference 
0 

df 
100 

t Stat 
7.67455 

P(T<=t) onetail 
5.65E12 

t Critical onetail 
2.364217 

P(T<=t) twotail 
1.13E11 

t Critical twotail 
2.625891 

T test = 7.675
P = 0
As P < level xss=removed>the mean income of males is more than of females.
9.3.3
A study was conducted that measured the total brain volume (TVB) (in ) of patients that had schizophrenia and patients that are considered normal. Table #9.3.5 contains the TVB of the normal patients and table #9.3.6 contains the TVB of schizophrenia patients ("SOCR data oct2009," 2013). Is there enough evidence to show that the patients with schizophrenia have less TBV on average than a patient that is considered normal? Test at the 10% level.
Table #9.3.5: Total Brain Volume (in ) of Normal Patients
1663407 
1583940 
1299470 
1535137 
1431890 
1578698 
1453510 
1650348 
1288971 
1366346 
1326402 
1503005 
1474790 
1317156 
1441045 
1463498 
1650207 
1523045 
1441636 
1432033 
1420416 
1480171 
1360810 
1410213 
1574808 
1502702 
1203344 
1319737 
1688990 
1292641 
1512571 
1635918 




Table #9.3.6: Total Brain Volume (in ) of Schizophrenia Patients
1331777 
1487886 
1066075 
1297327 
1499983 
1861991 
1368378 
1476891 
1443775 
1337827 
1658258 
1588132 
1690182 
1569413 
1177002 
1387893 
1483763 
1688950 
1563593 
1317885 
1420249 
1363859 
1238979 
1286638 
1325525 
1588573 
1476254 
1648209 
1354054 
1354649 
1636119 





Suppose that mean volume for normal patients is µ_{1} and mean volume for Schizophrenia patient is µ_{2}
Null Hypothesis: H_{0}: µ_{1} = µ_{2}
Alternate Hypothesis: H_{1}: µ_{1} > µ_{2} (claim)
I used excel to perform independent sample t test.
tTest: TwoSample Assuming Unequal Variances 


Normal 
Schizophrenia 
Mean 
1463339 
1451293 
Variance 
1.57E+10 
2.96E+10 
Observations 
32 
31 
Hypothesized Mean Difference 
0 

df 
55 

t Stat 
0.316843 

P(T<=t) onetail 
0.376281 

t Critical onetail 
1.297134 

P(T<=t) twotail 
0.752562 

t Critical twotail 
1.673034 

P = 0.3168
As P > level of significance 0.10, we fail to reject the null hypothesis.
Hence insufficient evidence to support the claim that the patients with schizophrenia have less TBV on average than a patient that is considered normal.
9.3.4
A study was conducted that measured the total brain volume (TBV) (in ) of patients that had schizophrenia and patients that are considered normal. Table #9.3.5 contains the TBV of the normal patients and table #9.3.6 contains the TBV of schizophrenia patients ("SOCR data oct2009," 2013). Compute a 90% confidence interval for the difference in TBV of normal patients and patients with Schizophrenia.
Suppose that mean volume for normal patients is µ1 and mean volume for Schizophrenia patient is µ2
Output for two sample t test confidence interval calculator:
Two sample T confidence interval:
?_{1} : Mean of Normal
?_{2} : Mean of Schizophrenia
?_{1}  ?_{2} : Difference between two means
(without pooled variances)
90% confidence interval results:
Difference 
Sample Diff. 
Std. Err. 
DF 
L. Limit 
U. Limit 
?_{1}  ?_{2} 
12046.025 
38018.926 
54.816618 
51564.575 
75656.626 
90% confidence interval = (51564.6, 75656.6)
Hence, we can be 90% confident that the true mean difference in TVB of normal patients and patients with Schizophrenia will lie within (51564.6, 75656.6)
As 0 is within the confidence interval, there is no any significant difference between TBV of normal patients and patients with Schizophrenia.
9.3.8
The number of cell phones per 100 residents in countries in Europe is given in table #9.3.9 for the year 2010. The number of cell phones per 100 residents in countries of the Americas is given in table #9.3.10 also for the year 2010 ("Population reference bureau," 2013). Find the 98% confidence interval for the difference in a mean number of cell phones per 100 residents in Europe and the Americas.
Table #9.3.9: Number of Cell Phones per 100 Residents in Europe
100 
76 
100 
130 
75 
84 
112 
84 
138 
133 
118 
134 
126 
188 
129 
93 
64 
128 
124 
122 
109 
121 
127 
152 
96 
63 
99 
95 
151 
147 
123 
95 
67 
67 
118 
125 
110 
115 
140 
115 
141 
77 
98 
102 
102 
112 
118 
118 
54 
23 
121 
126 
47 

Table #9.3.10: Number of Cell Phones per 100 Residents in the Americas
158 
117 
106 
159 
53 
50 
78 
66 
88 
92 
42 
3 
150 
72 
86 
113 
50 
58 
70 
109 
37 
32 
85 
101 
75 
69 
55 
115 
95 
73 
86 
157 
100 
119 
81 
113 
87 
105 
96 



Output for twosample ttest confidence interval calculator:
Two sample T confidence interval:
?_{1} : Mean of Europe
?_{2} : Mean of America
?_{1}  ?_{2} : Difference between two means
(without pooled variances)
98% confidence interval results:
Difference 
Sample Diff. 
Std. Err. 
DF 
L. Limit 
U. Limit 
?_{1}  ?_{2} 
20.945815 
6.9736187 
74.029011 
4.3640739 
37.527556 
98% confidence interval = (4.364, 37.528)
Hence, we can be 98% confident that true mean difference in mean number of cell phones per 100 residents in Europe and the Americas.
As the confidence interval don’t include 0, there is a significant difference between mean number of cell phones per 100 residents in Europe and the Americas.
Mean number of cell phones per 100 residents in Europe is significantly more than Americans.
11.3.2
LeviStrauss Co manufactures clothing. The quality control department measures weekly values of different suppliers for the percentage difference of waste between the layout on the computer and the actual waste when the clothing is made (called runup). The data is in table #11.3.3, and there are some negative values because sometimes the supplier is able to layout the pattern better than the computer ("Waste run up," 2013). Do the data show that there is a difference between some of the suppliers? Test at the 1% level.
Table #11.3.3: Runups for Different Plants Making Levi Strauss Clothing
Plant 1 
Plant 2 
Plant 3 
Plant 4 
Plant 5 
1.2 
16.4 
12.1 
11.5 
24 
10.1 
6 
9.7 
10.2 
3.7 
2 
11.6 
7.4 
3.8 
8.2 
1.5 
1.3 
2.1 
8.3 
9.2 
3 
4 
10.1 
6.6 
9.3 
0.7 
17 
4.7 
10.2 
8 
3.2 
3.8 
4.6 
8.8 
15.8 
2.7 
4.3 
3.9 
2.7 
22.3 
3.2 
10.4 
3.6 
5.1 
3.1 
1.7 
4.2 
9.6 
11.2 
16.8 
2.4 
8.5 
9.8 
5.9 
11.3 
0.3 
6.3 
6.5 
13 
12.3 
3.5 
9 
5.7 
6.8 
16.9 
0.8 
7.1 
5.1 
14.5 

19.4 
4.3 
3.4 
5.2 

2.8 
19.7 
0.8 
7.3 

13 
3 
3.9 
7.1 

42.7 
7.6 
0.9 
3.4 

1.4 
70.2 
1.5 
0.7 

3 
8.5 



2.4 
6 



1.3 
2.9 



Null Hypothesis: Mean Runups for all the 5 plants are equal to each other.
Alternate Hypothesis: Mean Runups for at least one plant is different from others.
Level of significance ? = 0.01
Excel output for one way ANOVA:
Anova: Single Factor 

SUMMARY 

Groups 
Count 
Sum 
Average 
Variance 

Plant 1 
22 
99.5 
4.522727 
100.6418 

Plant 2 
22 
194.3 
8.831818 
235.7289 

Plant 3 
19 
91.8 
4.831579 
19.38784 

Plant 4 
19 
142.3 
7.489474 
13.37433 

Plant 5 
13 
134.9 
10.37692 
91.29859 

ANOVA 

Source of Variation 
SS 
df 
MS 
F 
Pvalue 
F crit 
Between Groups 
450.9207 
4 
112.7302 
1.159631 
0.334012 
3.534992 
Within Groups 
8749.088 
90 
97.21209 

Total 
9200.009 
94 




P = 0.334
As P > level of significance 0.01, we fail to reject the null hypothesis.
So insufficient evidence to support the claim that there is a difference between Runups for some of the suppliers.
11.3.4
A study was undertaken to see how accurate food labeling for calories on food that is considered reducedcalorie. The group measured the number of calories for each item of food and then found the percent difference between measured and labeled food, . The group also looked at food that was nationally advertised, regionally distributed, or locally prepared. The data is in table #11.3.5 ("Calories datafile," 2013). Do the data indicate that at least two of the mean percent differences between the three groups are different? Test at the 10% level.
Table #11.3.5: Percent Differences Between Measured and Labeled Food
National Advertised 
Regionally Distributed 
Locally Prepared 
2 
41 
15 
28 
46 
60 
6 
2 
250 
8 
25 
145 
6 
39 
6 
1 
16.5 
80 
10 
17 
95 
13 
28 
3 
15 
3 

4 
14 

4 
34 

18 
42 

10 


5 


3 


7 


3 


0.5 


10 


6 


Null Hypothesis: Mean Percent Differences Between Measured and Labeled Food for all the 3 groups are equal to each other.
Alternate Hypothesis: Mean Percent Differences Between Measured and Labeled Food for at least one group is different from others.
Level of significance = 0.10
Excel output for data tools one way ANOVA:
Anova: Single Factor 

SUMMARY 

Groups 
Count 
Sum 
Average 
Variance 

National Advertised 
20 
2.5 
0.125 
110.6809 

Regionally Distributed 
12 
301.5 
25.125 
258.3693 

Locally Prepared 
8 
654 
81.75 
7050.786 

ANOVA 

Source of Variation 
SS 
df 
MS 
F 
Pvalue 
F crit 
Between Groups 
38095.9 
2 
19047.95 
12.97915 
5.36E05 
2.452014 
Within Groups 
54300.5 
37 
1467.581 

Total 
92396.4 
39 




P = 0
As P < level>