Question

In: Computer Science

Assume a linear model and then add 0-mean Gaussian noise to generate a sample. Divide your...

Assume a linear model and then add 0-mean Gaussian noise to generate a sample. Divide your sample into two as training and validation sets.

Use linear regression using the training half. Compute error on the validation set. Do the same for polynomials of degrees 2 and 3 as well

Solutions

Expert Solution

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
x=np.array(list(range(10)))
y=2*x;

noise=np.random.randn(1,10)
y=y+np.array(noise)
plt.scatter(x,y)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(x.reshape(-1,1),y.reshape(-1,1),test_size=0.3,random_state=100)
X_train2,X_test2,y_train2,y_test2=train_test_split(x.reshape(-1,1),y.reshape(-1,1),test_size=0.3,random_state=1)

from sklearn.linear_model import LinearRegression
model1=LinearRegression()
model1.fit(X_train,y_train)
predictions1=model1.predict(X_test)

model2=LinearRegression()
model2.fit(X_train2,y_train2)
predictions2=model2.predict(X_test2)

from sklearn.metrics import mean_squared_error
model1_error=mean_squared_error(y_test,predictions1)**0.5
model2_error=mean_squared_error(y_test2,predictions2)**0.5

print("model1 error=",model1_error)
print("model2 error=",model2_error)

##polynomial regression
from sklearn.preprocessing import PolynomialFeatures

poly_2 = PolynomialFeatures(degree = 2)
X1 = poly_2.fit_transform(x.reshape(-1,1))
X1_train,X1_test,y1_train,y1_test=train_test_split(X1,y.reshape(-1,1),test_size=0.3,random_state=100)   
model3 = LinearRegression()
model3.fit(X1_train,y1_train)
predictions3=model3.predict(X1_test)

poly_3 = PolynomialFeatures(degree = 3)
X2 = poly_3.fit_transform(x.reshape(-1,1))
X2_train,X2_test,y2_train,y2_test=train_test_split(X2,y.reshape(-1,1),test_size=0.3,random_state=100)
model4=LinearRegression()
model4.fit(X2_train,y2_train)
predictions4=model4.predict(X2_test)

model3_error=mean_squared_error(y1_test,predictions3)**0.5
model4_error=mean_squared_error(y2_test,predictions4)**0.5

print("model3 error=",model3_error)
print("model4 error=",model4_error)


Related Solutions

Let's assume, for instance, that you are drawing a sample from a normal distribution with mean=0,...
Let's assume, for instance, that you are drawing a sample from a normal distribution with mean=0, and then you use t-test to assess whether the mean of the sample you just drawn is "significantly different" from zero. Then you repeat this procedure (draw-run t-test) many times. Will at least some of your samples (drawn from normal with known mean=0!) look significant? Why? Isn't p-value supposed to tell us good findings from the bad and true hypotheses from false ones? How...
Generate data of length n=100 from a Gaussian AR(2) model with phi1=0.5 and phi2=0.2 and input...
Generate data of length n=100 from a Gaussian AR(2) model with phi1=0.5 and phi2=0.2 and input variance sigma^2=1. Afterwards, pretend the values of phi1, phi2 and sigma^2 are unknown. i. Use the R function ar() to fit an AR(p) model to your data. As options for the ar() function, first use aic = TRUE. What estimated order did aic minimization give for your data? Try three different fitting methods, method="yule-walker", method="ols"and also method="mle". Do you see significant differences between the...
Generate data of length n=100 from a Gaussian AR(2) model with phi1=0.5 and phi2=0.2 and input...
Generate data of length n=100 from a Gaussian AR(2) model with phi1=0.5 and phi2=0.2 and input variance sigma^2=1. Afterwards, pretend the values of phi1, phi2 and sigma^2 are unknown. i. Use the R function ar() to fit an AR(p) model to your data. As options for the ar() function, first use aic = TRUE. What estimated order did aic minimization give for your data? Try three different fitting methods, method="yule-walker", method="ols"and also method="mle". Do you see significant differences between the...
LECTURE OF THE OPERATIONS RESEARCH I You will randomly generate a linear programming model. • Objective...
LECTURE OF THE OPERATIONS RESEARCH I You will randomly generate a linear programming model. • Objective function should be a maximization problem. • Model must have exactly three decision variables. • Model must have two less-than equality (≤) constraints. Please answer the following parts: a) Take the dual of the primal problem you have on hand. b) Solve the dual problem by using Graphical Solution Procedure. If the dual problem does not have a single optimal solution (or if the...
In the simple linear regression model ? = ?0 + ?1? +?, explain how the variance...
In the simple linear regression model ? = ?0 + ?1? +?, explain how the variance of the error term u, the sample variance of x, and the sample size n, affect the precision with which we can estimate the unknown parameter ?1
Explain why, empirically, there is mean reversion in AE (abnormal earnings). If you divide the sample...
Explain why, empirically, there is mean reversion in AE (abnormal earnings). If you divide the sample into deciles according to scaled current levels of AE, which stocks show the greatest AE persistence?
Explain why, empirically, there is mean reversion in AE (abnormal earnings). If you divide the sample...
Explain why, empirically, there is mean reversion in AE (abnormal earnings). If you divide the sample into deciles according to scaled current levels of AE, which stocks show the greatest AE persistence?
(a) Generate 200 replicas of uniform [-3.14, 3.14] and 200 normal with mean 0 and standard...
(a) Generate 200 replicas of uniform [-3.14, 3.14] and 200 normal with mean 0 and standard deviation 1/8. Set data x=uniform e=normal y=sin(x)+e Fit the data with various types of smoothing techniques. (b) The same as (a) except changing the standard deviation from 1/8 to 1/2.
How does a simple linear model using one continuous predictor change if we add an interaction...
How does a simple linear model using one continuous predictor change if we add an interaction term with an indicator variable, but don’t include the indicator variable on its own?
Assume a study of sample size n was conducted and the sample mean was found to...
Assume a study of sample size n was conducted and the sample mean was found to be  and the sample variance s2. Which of the below is the correct way to calculate the standard error of the mean of the sample? Suppose you found the prices for 11 recently sold puppies in your area and recorded them in the table below. The next few questions will use this sample. Price ($): 1437.78 1902.35 1657.76 2057.27 1823.35 1816.12 1808.84 1654.00 1815.81 1968.85...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT