In: Computer Science
Suppose you were interested in crop yields and you had collected data on the amount of rainfall, the amount of fertilizer, the average temperature, and the number of sunny days.
How could you formalize this a as regression problem?
Multiple linear regression to forecast the crop yield
Multiple linear regression is a variant of “linear regression”
analysis. This model is built to establish the relationship that
exists between one dependent variable and two or more independent
variables [19].For a given dataset where x1… xk are independent
variables and Y is a dependent variable, the multiple linear
regression fits the dataset to the model:
yi=β0+β1x1i+β2x2i+⋯+βkxki+εi
where β 0 is the y-intercept and β1,β2,…,βk parameters are called
the partial coefficients. In matrix form
Y=XB+E
Y=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢y1y2...yn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢11.x11x21.x12x22.………x1kx2k...1..xn1..xn2………..xnk⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥B=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢β0β1...βk⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥E=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢ε0ε1...εn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥
Before applying the multiple linear regression to forecast the crop
yield, it’s necessary to know the significant attributes from the
database. All the attributes used in the database will not be
significant or changing the value of these attributes will not
affect anything on the dependent variables. Such attributes can be
neglected. P value test is performed on the database to find the
significant attributes and multiple linear regression is applied
only on the significant values to forecast the crop yield.