Question

In: Statistics and Probability

Linear regression is a statistical tool commonly used to find a relationship that exists between a...

Linear regression is a statistical tool commonly used to find a relationship that exists between a variable and one explanatory variable. What are the factors that affect a linear regression model? How can you accomplish linear regression in R? Please provide an example to illustrate your assertions.

Solutions

Expert Solution

First you need to check assumptions of linear regression model.

There are four assumptions associated with a linear regression model:

  1. Linearity: The relationship between X and the mean of Y is linear.
  2. Homoscedasticity: The variance of residual is the same for any value of X.
  3. Independence: Observations are independent of each other.
  4. Normality: For any fixed value of X, Y is normally distributed.

This can be done by using four in one plot in R.

Also it is important to check presence of outliers.

Example using R:

step 1. Import data in R

step 2. Fit regression model using lm() function.

step 3. Plot that fitted model to get four in one plot.

Example:

Consider an example of predicting weight of person using its height as predictor variable.

> height = c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

> weight = c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
> fit=lm(weight~height)
> summary(fit)

Call:
lm(formula = weight ~ height)

Residuals:
Min 1Q Median 3Q Max
-6.3002 -1.6629 0.0412 1.8944 3.9775

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
height 0.67461 0.05191 12.997 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.253 on 8 degrees of freedom
Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06

> par(mfrow=c(2,2))# it divides graph window in 4 sections
> plot(fit)


Related Solutions

What is the difference between a linear relationship and a curvilinear relationship in linear regression?
What is the difference between a linear relationship and a curvilinear relationship in linear regression?
True or False: -Linear regression is one of the least commonly used regression techniques. -The difference...
True or False: -Linear regression is one of the least commonly used regression techniques. -The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable. -Least Square Method calculates the best-fit line for the observed data by minimizing the sum of the squares of the horizontal deviations from each data point to the line. -We can evaluate the model performance using the metric...
Simple linear regression is used to analyse the relationship between company sales and company profit. The...
Simple linear regression is used to analyse the relationship between company sales and company profit. The following tables are multiple linear regression output. Is there a linear relationship between company sales and company profit at the significance level of 0.05? Give your evidence. Click or tap here to enter text. What percentage of variance (Coefficient of Determination) of company profit is explained by company sales? Click or tap here to enter text. Give the value of the slope and intercept...
A measure of the strength of the linear relationship that exists between two variables is called:...
A measure of the strength of the linear relationship that exists between two variables is called: Slope/Intercept/Correlation coefficient/Regression equation. If both variables X and Y increase simultaneously, then the coefficient of correlation will be: Positive/Negative/Zero/One. If the points on the scatter diagram indicate that as one variable increases the other variable tends to decrease the value of r will be: Perfect positive/Perfect negative/Negative/Zero. The range of correlation coefficient is: -1 to +1/0 to 1/-∞ to +∞/0 to ∞. Which of...
Run a linear regression using Excel’s Data Analysis regression tool. Construct the linear regression equation and...
Run a linear regression using Excel’s Data Analysis regression tool. Construct the linear regression equation and determine the predicted total sales value if the number of promotions is 6. Is there a significant relationship? Clearly explain your reasoning using the regression results. Number of Promotions Total Sales 3 2554 2 1746 11 2755 14 1935 15 2461 4 2727 5 2231 14 2791 12 2557 4 1897 2 2022 7 2673 11 2947 11 1573 14 2980
A linear relationship exists between 2 quantitative variables. The correlation coefficient is -0.14. Which of the...
A linear relationship exists between 2 quantitative variables. The correlation coefficient is -0.14. Which of the following is true? A transformation should be done to try to make the correlation coefficient positive and closer to 1. There is no evidence to indicate a relationship exists between the two variables because of the negative correlation coefficient. This indicates a strong relationship between the two variables. This indicates a weak relationship between the two variables. This is impossible as correlation coefficients can’t...
In this problem, we will use linear regression and residual analysis to study the relationship between...
In this problem, we will use linear regression and residual analysis to study the relationship between square footage of a house and the home sales price. (a) Go to the course webpage and under Datasets, download the CSV file “homes.csv” and follow the accompanying Minitab instructions. Copy and paste the Fitted Line Plots and the Residual Plots in a blank document. Print these out and attach them to your homework. (b) Based on the fitted line and residual plots for...
What type of relationship between a dependent and independent variable is described by linear regression? A....
What type of relationship between a dependent and independent variable is described by linear regression? A. An exponential relationship B. A parabolic relationship C. A threshold effect D. A linear relationship
A paper suggests that the simple linear regression model is reasonable for describing the relationship between...
A paper suggests that the simple linear regression model is reasonable for describing the relationship between y = eggshell  thickness (in micrometers, µm) and x = egg length (mm) for quail eggs. Suppose that the population regression line is y = 0.115 + 0.007x and that σe = 0.005. Then, for a fixed x value, y has a normal distribution with mean 0.115 + 0.007x  and standard deviation 0.005. Approximately what proportion of quail eggs of length 14 mm have a shell...
The regression line that gives the linear relationship between the number of seeds or pellets eaten...
The regression line that gives the linear relationship between the number of seeds or pellets eaten and the amount of time to eat the food is predicted amount of time to eat the food = 42.8565 – 0.0554(number of seeds or pellets eaten). Suppose one day Parsnip eats 30 seeds. Based on the regression line, how long do you predict it will take Parsnip to eat these 30 seeds?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT