Question

In: Statistics and Probability

You work for a construction firm who need to be able to accurately predict the compressive strength of concrete given variables including the concrete composition and its age.

 

You work for a construction firm who need to be able to accurately predict the compressive strength of concrete given variables including the concrete composition and its age. You are aware that the relationship between these different components and the concrete strength is complex, however you have been asked to investigate how well a simple linear regression model works for prediction. Using the provided data (Concrete_Data.xls), develop models to predict:

  • Concrete strength from the single best indicator variable;
  • Concrete strength from all variables.

With the second model, determine if any variables are not contributing significantly to the model, and what impact removing these has on prediction performance. Comment on the final model and its accuracy.

You should draw on the unit content concerning correlation and regression to answer this question. Note that you are not expected to use training/validation/testing data splits, although you are welcome to do so. (Provide a MATLAB code and visualisations to justify your response.)

 

Solutions

Expert Solution

Solution:-

Save your dataset in the matlab folder otherwise it will give you the path error while reading the xls data.

concrete_data= xlsread('D:\D DRIVE DATA\MATLAB\All data\Concrete_Data.xls');
X_input= concrete_data(:,1:4); %train model using first 4 columns
Predictor_output=concrete_data(:,9) %keep 9th column as a lbel

con_model = LinearModel.fit(X_input,Predictor_output);
plot(con_model)

N = size(X_input,2) ;
M= size(Predictor_output,2) ;
for k0=1:N
for k = 1:M,% loop over all columns
tmp = corrcoef(X_input(:,k0),Predictor_output(:,k)) ;
CC(k,k0) = tmp(1,2) ; %It shows you the correlation of 8 predictors with your response variable
%and you can recognize which predictor is
%best.In this case your first predictor has
%0.49 correlation with response variable. So we
%take it as a single best predictor variable
  
end
end

test_pred= concrete_data(:,5:8);
pred = predict(con_model,test_pred);


Related Solutions

ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT