In: Math
Rejection Region
After reviewing data from a sample, an inference can be made about the population. For example,
Find a data set on the internet. Some suggested search terms: Free Data Sets, Medical Data Sets, Education Data Sets.
After reviewing data from a sample, an inference can be made about the population. For example,
Find a data set on the internet. Some suggested search terms: Free Data Sets, Medical Data Sets, Education Data Sets.
Here I have used a sample dataset of Mortgage Interest Rate and Home Prices obtained from
https://journalistsresource.org/wp-content/uploads/2014/11/Sample-data-sets-for-linear-regression1.xlsx
Now I shall start doing analysis on it. For analysis purpose, I am using R - Studio. Let's start.
# Linear Regression Analysis:
# Step 1 : Importing the data
setwd("C:/Users/raqui/Desktop")
getwd
data = read.csv("data.csv",header = T)
data
data = data.frame(data)
# Step 2 : Exploratory data analysis
# interest_rate is the explanatory variable and median_home_price is the dependent variable
summary(data$interest_rate)
summary(data$Median_home_price)
boxplot(data$interest_rate)
boxplot(data$Median_home_price)
# Step 3 : Examining the trend : whether it is linear or quadratic or cubic or something else
x = data$interest_rate
y = data$Median_home_price
# We should standardize as the units are not same for x and y
y_std = (y - mean(y))/sd(y)
x_std = (x - mean(x))/sd(x)
plot(x_std,y_std)
# Inference: We can see a downward trend of home price with the interest rate
# Step 4: Actual regression fitting
l1 = lm(y_std~x_std)
summary(l1)
Output:
> summary(l1)
Call:
lm(formula = y_std ~ x_std)
Residuals:
Min 1Q Median 3Q Max
-0.9648 -0.7410 -0.1867 0.5735 1.4147
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.163e-17 2.030e-01 0.000 1.0000
x_std -6.202e-01 2.097e-01 -2.958 0.0104 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 0.812 on 14 degrees of
freedom
Multiple R-squared: 0.3846, Adjusted R-squared:
0.3406
F-statistic: 8.749 on 1 and 14 DF, p-value: 0.01038
plot(l1)
# Inference: The data is not very much appropriate for the linear model so we will move to the higher degree
# Step 5: Regression for a higher degree
l2 = lm(y_std~poly(x_std,2))
summary(l2)
Output:
> summary(l2)
Call:
lm(formula = y_std ~ poly(x_std, 2))
Residuals:
Min 1Q Median 3Q Max
-0.92704 -0.26805 -0.04894 0.13192 1.07002
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.716e-16 1.392e-01 0.000 1.00000
poly(x_std, 2)1 -2.402e+00 5.567e-01 -4.315 0.00084 ***
poly(x_std, 2)2 2.281e+00 5.567e-01 4.097 0.00126 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 0.5567 on 13 degrees of
freedom
Multiple R-squared: 0.7314, Adjusted R-squared:
0.6901
F-statistic: 17.7 on 2 and 13 DF, p-value: 0.0001945
plot(l2)
plot(x_std,y_std)
lines(x_std,predict(l2),col = "red")
# Now we can see that there is a significant improvement over the linear regression model
# now we will move to the hypothesis.
for the regression equation, the hypothesis which is generally
constructed is
all the betas are insignificant, with the alternative hypothesis
that the betas are significant
Hence here we have a both sided test and the p-value obtained from
the model output signifies that whether
the hypothesis is rejected or accepted.
Here in our case, we can see that the intercept does not have
significance, hence we can safely eliminate the intercept from the
model as p-value associated with that is much greater than the
significance levels(which is 0.05)
Now for the explanatory variable, the 1st degree and 2nd-degree beta coefficients are significant as the p-values associated with them are much smaller than the confidence level which is 0.05. Hence they will be considered in the regression equation.
Thus, in this way, we can analyse data and set a hypothesis to test by some statistical analysis.
Hope this answer has helped you.
Thanks !!