STAT200 Introduction to Statistics
Part A: Inferential Statistics Data Analysis Plan and Computation
Introduction:
Variables Selected:
Table 1: Variables Selected for Analysis
Variable Name in the Data Set |
Variable Type |
Description |
Qualitative or Quantitative |
Variable 1: Marital status |
Socioeconomic |
Marital Status of Head of Household in SE |
Qualitative |
Variable 2: Housing |
Expenditure |
Total Amount of Annual Expenditure on Housing |
Quantitative |
Variable 3: Electricity |
Expenditure |
Total Amount of annual expenditure on Electricity |
Quantitative |
Data Analysis:
1. Confidence Interval Analysis: I am using excel data-analysis tool to find the confidence interval with level of confidence 95%. As, population standard deviation is unknown, t distribution is appropriate to use in this case.
Excel output:
USD-Housing |
|
|
|
Mean |
21684.87 |
Standard Error |
632.3268 |
Median |
20607 |
Mode |
#N/A |
Standard Deviation |
3463.397 |
Sample Variance |
11995116 |
Kurtosis |
-1.61638 |
Skewness |
0.37244 |
Range |
9015 |
Minimum |
18149 |
Maximum |
27164 |
Sum |
650546 |
Count |
30 |
Confidence Level(95.0%) |
1293.254 |
Table 2: Confidence Interval Information and Results
Name of Variable: Housing |
State the Random Variable and Parameter in Words: Random variable is Total Amount of Annual Expenditure on Housing for randomly selected house. Parameter is mean value of Total Amount of Annual Expenditure on Housing for all households.
|
Confidence interval method including confidence level and rationale for using it: I am using 95% level of confidence. Confidence interval will be calculated with help of t – distribution table. As the sample size is smaller in comparison to population size, I think sample can be biased and cannot be good representative of population. |
State and check the assumptions for confidence interval: Sample size for the given data is 30. So, for any shape of population, we can assume that normal distribution condition holds true for the sample. Also, population standard deviation is unknown. So, we can use t-distribution to calculate the confidence interval. |
Method Used to Analyze Data: t -distribution can be used to analyze data. |
Find the sample statistic and the confidence interval: Sample mean = 21684.87 Standard deviation s = 3463.397 Sample size n = 30 Standard error of mean = 632.3268 For 95% confidence margin of error = 1293.254 Hence 95% confidence interval = (21684.87 – 1293.254, 21684.87 + 1293.254) = (20391.616, 22978.124) |
Statistical Interpretation: We can 95% confident that true population mean expenditure on housing for US households will lie within the interval (20391.62, 22978.12) |
2. Hypothesis Testing: Using the second expenditure variable (with socioeconomic variable as the grouping variable for making two groups), select and run the appropriate method for making decisions about two parameters relative to observed statistics (i.e., two sample hypothesis testing method) and complete the following table (Note: Format follows Kozak outline):
Table 3: Two Sample Hypothesis Test Analysis
Research Question: Expenditure on electricity for married people will be more than unmarried people.
|
||||||||||||||||||||||||||||||||||||||||||
Two Sample Hypothesis Test that Will Be Used and Rationale for Using It: As there is no any natural pairing exist between married and unmarried people, we can use independent sample t test for the analysis. |
||||||||||||||||||||||||||||||||||||||||||
State the Random Variable and Parameters in Words: Random variable is annual expenditure on electricity by people of USA. Parameter is mean expenditure on electricity for married and unmarried peoples respectively.
|
||||||||||||||||||||||||||||||||||||||||||
State Null and Alternative Hypotheses and Level of Significance: Let µ1 = mean expenditure on electricity for married µ2 = mean expenditure on electricity for un-married H0: µ1 = µ2 H1: µ1 > µ2 (claim) |
||||||||||||||||||||||||||||||||||||||||||
Method Used to Analyze Data: Right tailed hypothesis testing will be done with help of independent sample t test tool in excel data analysis. Level of significance used = 0.05 |
||||||||||||||||||||||||||||||||||||||||||
Find the sample statistic, test statistic, and p-value:
|
||||||||||||||||||||||||||||||||||||||||||
Conclusion Regarding Whether or Not to Reject the Null Hypothesis: As P > level of significance, we fail to reject the null hypothesis. Hence insufficient evidence to support the claim that Expenditure on electricity for married people will be more than unmarried people. |
Part B: Results Write Up
Variable Name in data set |
Description |
Type of Variable (Qualitative or Quantitative) |
SE-Income |
Annual Household Income in USD |
Quantitative |
SE-Family Size |
Total Number of People in Family - Both Adults and Children |
Qualitative |
USD-Annual Expenditures |
Total Amount of Annual Expenditures |
Quantitative |
USD-Housing |
Total Amount of Annual Expenditure on Housing |
Quantitative |
USD-Electricity |
Total Amount of annual expenditure on Electricity |
Quantitative |
My scenario is a 38-year-old woman, married, with a family size of 6, with an annual income of $107,235. I’ve chosen this because it is the closest example somewhat similar to my own household. I’ve chosen to explore the socioeconomic variables of income, SE-Marital status and SE-Family size. The two expenditure variables will be USD-Housing and USD-Electricity. I’ve chosen these because I am curious of these costs of a family similar to my own where housing costs and energy expenditures are the highest and of most concern in the overall annual family expenditures.
Annual Housing Expenditures
Numerical Summary.
Table 6. Descriptive Analysis for Variable 5
Variable |
n |
Measure(s) of Central Tendency |
Measure(s) of Dispersion |
Variable: Housing Expenditures |
|
Median = 20,607 |
SD = 3463.397 |
I chose to use median as the measure of central tendency because it’s the most appropriate when the data is skewed, and its measured by the ratio of scale of measurement since its continuous and quantitative. Standard deviation is the measure of dispersion I chose because it’s the most common measure of dispersion for data from a large data set.
Data from the survey was analyzed and the median was determined as USD 20,607, with a standard deviation of USD 3463.4. The mean of the data is USD 24,684.87.
Electricity Expenditures
Numerical Summary
Table 7. Descriptive Analysis for Variable 6
Variable |
n |
Measure(s) of Central Tendency |
Measure(s) of Dispersion |
Variable: |
|
Median = 14665.5 |
SD = 3463.397 |
I chose to use median as the measure of central tendency because it’s the most appropriate when the data is skewed, and its measured by the ratio of scale of measurement since its continuous and quantitative. Standard deviation is the measure of dispersion I chose because it’s the most common measure of dispersion for data from a large data set.
Data from the survey was analyzed and the median was determined as USD 1,466.5, with a standard deviation of USD 101.3. The mean of the data is USD 1463.4.
Confidence Interval Analysis:
One can be 95% confident that the true population mean for household expenditure in USA will lie within confidence interval (20391.62, 22978.12). In other words, if we draw lots of similar samples and calculate the confidence interval, then we can get that in 95?ses, true mean will lie within the confidence interval.
Two Sample Hypothesis Test Analysis:
There is no any significant difference in expenditure on electricity in US households for married and not married people as the null hypothesis is not rejected.
Discussion:
I conclude the annual income varies greatly with a mean value of $100,611. The family size of the respondents varies from an individual family unit of 1 to a family of 6. Most families however, have 3-4 members. The average annual income consists of being between $55,120 to $64,747 according to the data set.
Among the three expenditure variables, housing has the highest expenditure value, with a mean expenditure of $21,685, while water has lowest expenditure value with a mean of $627.
The best expenditure to reduce in order to save money will be the housing expenditure. If the amount spent was reduced to the median value, it will result in an annual savings of about $1000. It will be the most reasonable expenditure to save on as it has the highest standard deviation among the three expenditure variables.
So, expenditure on household for USA families is around 20000 to 23000 dollars. I am feeling that the lower bound values are for unmarried as their responsibilities are lesser. If we observe the hypothesis testing result, it looks like no any difference in electricity expenditure for married and not-married. So, the both groups consume similar electricity.