In: Math
Clark postulates that household size, respondents age and vulnerability index are covariates of monthly income. What are the various null hypotheses.
As per Clark postulates monthly income depends on household
size, respondents age and vulnerability index.
So we need to fit multiple regression i.e.
Y=a+b1X1+b2X2+b3X3
Where Y is dependent variable i.e. monthly income and X1 X2 X3 are the independent variables i.e.
X1 =household size
X2 =respondents age
X3 = vulnerability index
a = Y-intercept (Constant Term)
b1,b2,b3= estimated slope of a regression of Y on X1 X2 X3
The main null hypothesis of a multiple regression is that there is no relationship between the X variables( X1 X2 X3 ) and the Y variable; in other words, the Y values you predict from your multiple regression equation are no closer to the actual Y values than you would expect by chance. As you are doing a multiple regression, you'll also test a null hypothesis for each X variable, that adding that X variable to the multiple regression does not improve the fit of the multiple regression equation any more than expected by chance. While you will get P values for the null hypotheses, you should use them as a guide to building a multiple regression equation; you should not use the P values as a test of biological null hypotheses about whether a particular X variable causes variation in Y.
How it works
The basic idea is that you find an equation that gives a linear relationship between the X variables and the Y variable, like this:
Ŷ=a+b1X1+b2X2+b3X3
The Ŷ is the expected value of Y for a given set of X values. b1 is the estimated slope of a regression of Y on X1, if all of the other X variables could be kept constant, and so on for b2, b3, etc; a is the intercept. I'm not going to attempt to explain the math involved, but multiple regression finds values of b1, etc. (the "partial regression coefficients") and the intercept (a) that minimize the squared deviations between the expected and observed values of Y.
How well the equation fits the data is expressed by R2, the "coefficient of multiple determination." This can range from 0 (for no relationship between Y and the X variables) to 1 (for a perfect fit, no difference between the observed and expected Y values). The P value is a function of the R2, the number of observations, and the number of X variables.
When the purpose of multiple regression is prediction, the important result is an equation containing partial regression coefficients. If you had the partial regression coefficients and measured the X variables, you could plug them into the equation and predict the corresponding value of Y. The magnitude of the partial regression coefficient depends on the unit used for each variable, so it does not tell you anything about the relative importance of each variable.
When the purpose of multiple regression is understanding functional relationships, the important result is an equation containing standard partial regression coefficients, like this:
y'̂=a+b'1x'1+b'2x'2+b'3x'3...
where b'1 is the standard partial regression coefficient of Y on X1. It is the number of standard deviations that Y would change for every one standard deviation change in X1, if all the other Xvariables could be kept constant. The magnitude of the standard partial regression coefficients tells you something about the relative importance of different variables; X variables with bigger standard partial regression coefficients have a stronger relationship with the Y variable.