In: Statistics and Probability
a. Use the multiple regression spreadsheet provided in the Excel spreadsheet to do the following: create a new data set that is a subset of this larger provided data set. The new data set should include the dependent variable (VALUE) and the following independent variables (AGE, LOTSIZE, RMS, MOD KITCH, MOD BATH, AIRCON and FIREPL).
b. For this new data set, use the Correlation option of Data Analysis in Excel and produce the correlation matrix associated with the dependent variable and the independent variables.
c. Rank the independent variables based upon strength of relationship to the dependent variable. Order them from most strongly related to least strongly related.
d. Identify the pair of independent variables that are the most strongly related to each other.
e. Identify the pair of independent variables that are the least strongly related to each other.
f. For the answer to part d, test to see if the population correlation coefficient (rho) for those two variables is different from zero. Use alpha = 0.05.
g. For the answer to part e, test to see if the population correlation coefficient (rho) for those two variables is different from zero. Use alpha = 0.05.
For the data set, I have it sitting at https://www.dropbox.com/s/2bn2rcj7l6sm9lx/Data%20Set.xlsx?dl=0
I apologize, it was too large to paste here.
a) To make a subset of the data you can create a new sheet and copy paste the required data.
b) Correlation matrix
VALUE | AGE | LOTSIZE | RMS | MOD KITCH | MOD BATH | AIRCON | FIREPL | |
VALUE | 1 | |||||||
AGE | -0.1628 | 1 | ||||||
LOTSIZE | -0.1193 | -0.2099 | 1 | |||||
RMS | 0.4331 | -0.2209 | 0.1236 | 1 | ||||
MOD KITCH | 0.2208 | -0.0366 | -0.2093 | 0.0814 | 1 | |||
MOD BATH | 0.2003 | 0.0178 | -0.2333 | 0.0670 | 0.8406 | 1 | ||
AIRCON | 0.2908 | -0.1323 | 0.0914 | 0.1257 | 0.0558 | -0.0314 | 1 | |
FIREPL | 0.0868 | 0.0483 | 0.0515 | 0.0990 | 0.0235 | 0.0591 | 0.0479 | 1 |
c. Rank the independent variables based upon strength of relationship to the dependent variable. Order them from most strongly related to least strongly related.
positive or negative indicates only the direction and the absolute value represents the strength of relationship.
Correlation with VALUE | Ranking | |
RMS | 0.4331 | 1 |
AIRCON | 0.2908 | 2 |
MOD KITCH | 0.2208 | 3 |
MOD BATH | 0.2003 | 4 |
AGE | -0.1628 | 5 |
LOTSIZE | -0.1193 | 6 |
FIREPL | 0.0868 | 7 |
d. The pair of independent variables that are the most strongly related to each other are MOD KITCH and MOD BATH whose correlation coefficient is 0.8406
e) The pair of independent variables that are the least strongly related to each other are MOD BATH and AGE whose correlation coefficient is 0.0178
f) MOD KITCH and MOD BATH
H0 : = 0
H1 : 0
alpha = 0.05
test statistic t =
r = 0.8406, n = 362
t = 29.44
critical value talpha/2 , n-2 = t0.025,360 = 1.967
since test stat is greater than critical value we reject null hypothesis and conclude that the population correlation coefficient (rho) for those two variables is different from zero.
g) MOD BATH and AGE
H0 : = 0
H1 : 0
alpha = 0.05
test statistic t =
r = 0.0178, n = 362
t = 0.337
critical value talpha/2 , n-2 = t0.025,360 = 1.967
since test stat is less than critical value we fail reject null hypothesis and conclude that the population correlation coefficient (rho) for those two variables is not different from zero.