STAT200 Introduction to Statistics

Part A: Inferential Statistics Data Analysis Plan and Computation

 

Introduction:

 

Variables Selected:

 

Table 1: Variables Selected for Analysis

Variable Name in the Data Set

Variable Type

Description

Qualitative or Quantitative

Variable 1: Marital status

Socioeconomic

Marital Status of Head of Household in SE

Qualitative

Variable 2: Housing

Expenditure

Total Amount of Annual Expenditure on Housing

Quantitative

Variable 3: Electricity

Expenditure

Total Amount of annual expenditure on Electricity

Quantitative

 

 

Data Analysis:

 

1.  Confidence Interval Analysis: I am using excel data-analysis tool to find the confidence interval with level of confidence 95%. As, population standard deviation is unknown, t distribution is appropriate to use in this case.

 

Excel output:

USD-Housing

 

 

 

Mean

21684.87

Standard Error

632.3268

Median

20607

Mode

#N/A

Standard Deviation

3463.397

Sample Variance

11995116

Kurtosis

-1.61638

Skewness

0.37244

Range

9015

Minimum

18149

Maximum

27164

Sum

650546

Count

30

Confidence Level(95.0%)

1293.254

 

 

Table 2: Confidence Interval Information and Results

Name of Variable: Housing

State the Random Variable and Parameter in Words: Random variable is Total Amount of Annual Expenditure on Housing for randomly selected house. Parameter is mean value of Total Amount of Annual Expenditure on Housing for all households.

 

Confidence interval method including confidence level and rationale for using it:

I am using 95% level of confidence.

Confidence interval will be calculated with help of t – distribution table. As the sample size is smaller in comparison to population size, I think sample can be biased and cannot be good representative of population.

State and check the assumptions for confidence interval:

Sample size for the given data is 30. So, for any shape of population, we can assume that normal distribution condition holds true for the sample. Also, population standard deviation is unknown. So, we can use t-distribution to calculate the confidence interval.

Method Used to Analyze Data: t -distribution can be used to analyze data.

Find the sample statistic and the confidence interval:

Sample mean = 21684.87

Standard deviation s = 3463.397

Sample size n = 30

Standard error of mean = 632.3268

For 95% confidence margin of error = 1293.254

Hence 95% confidence interval = (21684.87 – 1293.254, 21684.87 + 1293.254)

= (20391.616, 22978.124)

Statistical Interpretation:

We can 95% confident that true population mean expenditure on housing for US households will lie within the interval (20391.62, 22978.12)

 

 

2. Hypothesis Testing: Using the second expenditure variable (with socioeconomic variable as the grouping variable for making two groups), select and run the appropriate method for making decisions about two  parameters relative to observed statistics (i.e., two sample hypothesis testing method) and complete the following table (Note: Format follows Kozak outline):

 

Table 3: Two Sample Hypothesis Test Analysis

Research Question: Expenditure on electricity for married people will be more than unmarried people.

 

Two Sample Hypothesis Test that Will Be Used and Rationale for Using It:

As there is no any natural pairing exist between married and unmarried people, we can use independent sample t test for the analysis.

State the Random Variable and Parameters in Words:

Random variable is annual expenditure on electricity by people of USA. Parameter is mean expenditure on electricity for married and unmarried peoples respectively.

 

State Null and Alternative Hypotheses and Level of Significance:

Let µ1 = mean expenditure on electricity for married

µ2 = mean expenditure on electricity for un-married

H0: µ1 = µ2

H1: µ1 > µ2 (claim)

Method Used to Analyze Data:

Right tailed hypothesis testing will be done with help of independent sample t test tool in excel data analysis. Level of significance used = 0.05

Find the sample statistic, test statistic, and p-value:

t-Test: Two-Sample Assuming Unequal Variances

 

 

 

 

 

 

Married

Not Married

Mean

1461.2

1465.6

Variance

21070.89

172.4

Observations

15

15

Hypothesized Mean Difference

0

 

df

14

 

t Stat

-0.11692

 

P(T<=t) one-tail

0.454292

 

t Critical one-tail

1.76131

 

P(T<=t) two-tail

0.908584

 

t Critical two-tail

2.144787

 

 

 

 

 

Conclusion Regarding Whether or Not to Reject the Null Hypothesis:

As P > level of significance, we fail to reject the null hypothesis. Hence insufficient evidence to support the claim that Expenditure on electricity for married people will be more than unmarried people.

 

 

 

 

Part B: Results Write Up

 

Variable Name in data set

Description

Type of Variable (Qualitative or Quantitative)

SE-Income

Annual Household Income in USD

Quantitative

SE-Family Size

Total Number of People in Family - Both Adults and Children

Qualitative

USD-Annual Expenditures

Total Amount of Annual Expenditures

Quantitative

USD-Housing

Total Amount of Annual Expenditure on Housing

Quantitative

USD-Electricity

Total Amount of annual expenditure on Electricity

Quantitative

 

My scenario is a 38-year-old woman, married, with a family size of 6, with an annual income of $107,235. I’ve chosen this because it is the closest example somewhat similar to my own household. I’ve chosen to explore the socioeconomic variables of income, SE-Marital status and SE-Family size. The two expenditure variables will be USD-Housing and USD-Electricity. I’ve chosen these because I am curious of these costs of a family similar to my own where housing costs and energy expenditures are the highest and of most concern in the overall annual family expenditures.

Annual Housing Expenditures

 

Numerical Summary.

Table 6. Descriptive Analysis for Variable 5

Variable

n

Measure(s) of Central Tendency

Measure(s) of Dispersion

Variable: Housing Expenditures

 

Median = 20,607

SD = 3463.397

 

I chose to use median as the measure of central tendency because it’s the most appropriate when the data is skewed, and its measured by the ratio of scale of measurement since its continuous and quantitative. Standard deviation is the measure of dispersion I chose because it’s the most common measure of dispersion for data from a large data set.

Data from the survey was analyzed and the median was determined as USD 20,607, with a standard deviation of USD 3463.4. The mean of the data is USD 24,684.87.

Electricity Expenditures

 

Numerical Summary

 

Table 7. Descriptive Analysis for Variable 6

Variable

n

Measure(s) of Central Tendency

Measure(s) of Dispersion

Variable:

 

Median = 14665.5

SD = 3463.397

 

I chose to use median as the measure of central tendency because it’s the most appropriate when the data is skewed, and its measured by the ratio of scale of measurement since its continuous and quantitative. Standard deviation is the measure of dispersion I chose because it’s the most common measure of dispersion for data from a large data set.

Data from the survey was analyzed and the median was determined as USD 1,466.5, with a standard deviation of USD 101.3. The mean of the data is USD 1463.4.

 

 

Confidence Interval Analysis:

One can be 95% confident that the true population mean for household expenditure in USA will lie within confidence interval (20391.62, 22978.12). In other words, if we draw lots of similar samples and calculate the confidence interval, then we can get that in 95?ses, true mean will lie within the confidence interval.

 

Two Sample Hypothesis Test Analysis:

There is no any significant difference in expenditure on electricity in US households for married and not married people as the null hypothesis is not rejected.

 

Discussion:

I conclude the annual income varies greatly with a mean value of $100,611. The family size of the respondents varies from an individual family unit of 1 to a family of 6. Most families however, have 3-4 members. The average annual income consists of being between $55,120 to $64,747 according to the data set.

Among the three expenditure variables, housing has the highest expenditure value, with a mean expenditure of $21,685, while water has lowest expenditure value with a mean of $627.

The best expenditure to reduce in order to save money will be the housing expenditure. If the amount spent was reduced to the median value, it will result in an annual savings of about $1000. It will be the most reasonable expenditure to save on as it has the highest standard deviation among the three expenditure variables.

 

So, expenditure on household for USA families is around 20000 to 23000 dollars. I am feeling that the lower bound values are for unmarried as their responsibilities are lesser. If we observe the hypothesis testing result, it looks like no any difference in electricity expenditure for married and not-married. So, the both groups consume similar electricity.

 

No Need To Pay Extra
  • Turnitin Report

    $10.00
  • Proofreading and Editing

    $9.00
    Per Page
  • Consultation with Expert

    $35.00
    Per Hour
  • Live Session 1-on-1

    $40.00
    Per 30 min.
  • Quality Check

    $25.00
  • Total

    Free

New Special Offer

Get 25% Off

review

Call Back