In: Computer Science
(Machine learning Neural Network) Consider a regression Multilayer Perceptron (MLP) that uses identity activation functions for all neurons. A data scientist trains the neural network to minimize the MSE(Mean Squared Error), and he observes a much smaller training set MSE for the MLP as compared to that for OLS (Oridinary Least Squares). Is this possible? Justify your answer.
To solve nonlinearly separable problems, a number of neurons are connected in layers to form a multilayer perceptron. Each of the perceptrons is used to identify small linearly separable sections of the inputs. In multilayer perceptron activation functions are applied to network layers and modify the data they receive before passing it to the next layer. It give neural networks to allow them to model complex non-linear relationships. By modifying inputs with non-linear functions neural networks can model highly complex relationships between features.We have to use a common activation function to neurons.But the Ordinary Least Squares (OLS) is the most common estimation method for linear models.When the linear regression model satisfies the OLS assumptions, the procedure generates unbiased estimates that tend to be relatively close to the true population values or minimum variance. If some of these assumptions are not true, then it produce large variations and use any other estimation methods to improve the results. Many of these assumptions describe properties of the error term and the value of error term is unknown. So the Multilayer Perceptron (MLP) that uses identity activation functions performs better than Linear Regression with OLS if the data is nonlinearly distributed.