# Part A: Inferential Statistics Data Analysis Plan and Computation

Introduction:

Variables Selected:

Table 1: Variables Selected for Analysis

 Variable Name in the Data Set Variable Type Description Qualitative or Quantitative Variable 1: Marital status Socioeconomic Marital Status of Head of Household in SE Qualitative Variable 2: Housing Expenditure Total Amount of Annual Expenditure on Housing Quantitative Variable 3: Electricity Expenditure Total Amount of annual expenditure on Electricity Quantitative

Data Analysis:

## 1.  Confidence Interval Analysis: I am using excel data-analysis tool to find the confidence interval with level of confidence 95%. As, population standard deviation is unknown, t distribution is appropriate to use in this case.

Excel output:

 USD-Housing Mean 21684.87 Standard Error 632.3268 Median 20607 Mode #N/A Standard Deviation 3463.397 Sample Variance 11995116 Kurtosis -1.61638 Skewness 0.37244 Range 9015 Minimum 18149 Maximum 27164 Sum 650546 Count 30 Confidence Level(95.0%) 1293.254

### Table 2: Confidence Interval Information and Results

 Name of Variable: Housing State the Random Variable and Parameter in Words: Random variable is Total Amount of Annual Expenditure on Housing for randomly selected house. Parameter is mean value of Total Amount of Annual Expenditure on Housing for all households. Confidence interval method including confidence level and rationale for using it: I am using 95% level of confidence. Confidence interval will be calculated with help of t – distribution table. As the sample size is smaller in comparison to population size, I think sample can be biased and cannot be good representative of population. State and check the assumptions for confidence interval: Sample size for the given data is 30. So, for any shape of population, we can assume that normal distribution condition holds true for the sample. Also, population standard deviation is unknown. So, we can use t-distribution to calculate the confidence interval. Method Used to Analyze Data: t -distribution can be used to analyze data. Find the sample statistic and the confidence interval: Sample mean = 21684.87 Standard deviation s = 3463.397 Sample size n = 30 Standard error of mean = 632.3268 For 95% confidence margin of error = 1293.254 Hence 95% confidence interval = (21684.87 – 1293.254, 21684.87 + 1293.254) = (20391.616, 22978.124) Statistical Interpretation: We can 95% confident that true population mean expenditure on housing for US households will lie within the interval (20391.62, 22978.12)

2. Hypothesis Testing: Using the second expenditure variable (with socioeconomic variable as the grouping variable for making two groups), select and run the appropriate method for making decisions about two  parameters relative to observed statistics (i.e., two sample hypothesis testing method) and complete the following table (Note: Format follows Kozak outline):

#### Table 3: Two Sample Hypothesis Test Analysis

Research Question: Expenditure on electricity for married people will be more than unmarried people.

Two Sample Hypothesis Test that Will Be Used and Rationale for Using It:

As there is no any natural pairing exist between married and unmarried people, we can use independent sample t test for the analysis.

State the Random Variable and Parameters in Words:

Random variable is annual expenditure on electricity by people of USA. Parameter is mean expenditure on electricity for married and unmarried peoples respectively.

State Null and Alternative Hypotheses and Level of Significance:

Let µ1 = mean expenditure on electricity for married

µ2 = mean expenditure on electricity for un-married

H0: µ1 = µ2

H1: µ1 > µ2 (claim)

Method Used to Analyze Data:

Right tailed hypothesis testing will be done with help of independent sample t test tool in excel data analysis. Level of significance used = 0.05

Find the sample statistic, test statistic, and p-value:

 t-Test: Two-Sample Assuming Unequal Variances Married Not Married Mean 1461.2 1465.6 Variance 21070.89 172.4 Observations 15 15 Hypothesized Mean Difference 0 df 14 t Stat -0.11692 P(T<=t) one-tail 0.454292 t Critical one-tail 1.76131 P(T<=t) two-tail 0.908584 t Critical two-tail 2.144787

Conclusion Regarding Whether or Not to Reject the Null Hypothesis:

As P > level of significance, we fail to reject the null hypothesis. Hence insufficient evidence to support the claim that Expenditure on electricity for married people will be more than unmarried people.

Part B: Results Write Up

 Variable Name in data set Description Type of Variable (Qualitative or Quantitative) SE-Income Annual Household Income in USD Quantitative SE-Family Size Total Number of People in Family - Both Adults and Children Qualitative USD-Annual Expenditures Total Amount of Annual Expenditures Quantitative USD-Housing Total Amount of Annual Expenditure on Housing Quantitative USD-Electricity Total Amount of annual expenditure on Electricity Quantitative

My scenario is a 38-year-old woman, married, with a family size of 6, with an annual income of \$107,235. I’ve chosen this because it is the closest example somewhat similar to my own household. I’ve chosen to explore the socioeconomic variables of income, SE-Marital status and SE-Family size. The two expenditure variables will be USD-Housing and USD-Electricity. I’ve chosen these because I am curious of these costs of a family similar to my own where housing costs and energy expenditures are the highest and of most concern in the overall annual family expenditures.

Annual Housing Expenditures

Numerical Summary.

Table 6. Descriptive Analysis for Variable 5

 Variable n Measure(s) of Central Tendency Measure(s) of Dispersion Variable: Housing Expenditures Median = 20,607 SD = 3463.397

I chose to use median as the measure of central tendency because it’s the most appropriate when the data is skewed, and its measured by the ratio of scale of measurement since its continuous and quantitative. Standard deviation is the measure of dispersion I chose because it’s the most common measure of dispersion for data from a large data set.

Data from the survey was analyzed and the median was determined as USD 20,607, with a standard deviation of USD 3463.4. The mean of the data is USD 24,684.87.

Electricity Expenditures

Numerical Summary

Table 7. Descriptive Analysis for Variable 6

 Variable n Measure(s) of Central Tendency Measure(s) of Dispersion Variable: Median = 14665.5 SD = 3463.397

I chose to use median as the measure of central tendency because it’s the most appropriate when the data is skewed, and its measured by the ratio of scale of measurement since its continuous and quantitative. Standard deviation is the measure of dispersion I chose because it’s the most common measure of dispersion for data from a large data set.

Data from the survey was analyzed and the median was determined as USD 1,466.5, with a standard deviation of USD 101.3. The mean of the data is USD 1463.4.

Confidence Interval Analysis:

One can be 95% confident that the true population mean for household expenditure in USA will lie within confidence interval (20391.62, 22978.12). In other words, if we draw lots of similar samples and calculate the confidence interval, then we can get that in 95?ses, true mean will lie within the confidence interval.

Two Sample Hypothesis Test Analysis:

There is no any significant difference in expenditure on electricity in US households for married and not married people as the null hypothesis is not rejected.

Discussion:

I conclude the annual income varies greatly with a mean value of \$100,611. The family size of the respondents varies from an individual family unit of 1 to a family of 6. Most families however, have 3-4 members. The average annual income consists of being between \$55,120 to \$64,747 according to the data set.

Among the three expenditure variables, housing has the highest expenditure value, with a mean expenditure of \$21,685, while water has lowest expenditure value with a mean of \$627.

The best expenditure to reduce in order to save money will be the housing expenditure. If the amount spent was reduced to the median value, it will result in an annual savings of about \$1000. It will be the most reasonable expenditure to save on as it has the highest standard deviation among the three expenditure variables.

So, expenditure on household for USA families is around 20000 to 23000 dollars. I am feeling that the lower bound values are for unmarried as their responsibilities are lesser. If we observe the hypothesis testing result, it looks like no any difference in electricity expenditure for married and not-married. So, the both groups consume similar electricity.

##### No Need To Pay Extra
• Turnitin Report

\$10.00

\$9.00
Per Page
• Consultation with Expert

\$35.00
Per Hour
• Live Session 1-on-1

\$40.00
Per 30 min.
• Quality Check

\$25.00

### Free

New Special Offer