Question

In: Statistics and Probability

Make a set of boosted predictions on the response variable medv for the Boston dataset in...

Make a set of boosted predictions on the response variable medv for the Boston dataset in the MASS package, by first fitting the KNN model, then correcting the residuals using a singlelayer neural network with the shrinkage parameter λ = 0.1, then correcting the residuals using a random forest with shrinkage λ = 0.1. In each step, use 5-fold cross validation to tune the parameters: k (k = 5, 10, 20) in KNN, number of neurons in the hidden layer (hidden = 3, 5, 7) in the single-layer neural network, and mtry (m = 3, 6, 9, 12) in the random forest. At the end, compare the three sets of residuals by some plots.

Solutions

Expert Solution

import java.util.ArrayList;
import java.util.List;

public category Cluster purpose centroid;
   public int id;
  
   //Creates a replacement Cluster
   public Cluster(int id)

   public List getPoints() come points;
   }
  
   public void addPoint(Point point)

   public void setPoints(List points)

   public purpose getCentroid() come centroid;
   }

   public void setCentroid(Point centroid) ;
   }

   public int getId()
  
   public void clear()
  
   public void plotCluster()
       System.out.println("]");
   }

}
Point.java

import java.util.ArrayList;

import java.util.List;
import java.util.Random;

public category purpose non-public double x = 0;
personal double y = 0;
personal int cluster_number = 0;

public Point(double x, double y)
  
  
public void setX(double x)
  
public double getX() come back this.x;
}
  
public void setY(double y)
  
public double getY() come back this.y;
}
  
public void setCluster(int n)
  
public int getCluster() come back this.cluster_number;
}
  
//Calculates the space between 2 points.
protected static double distance(Point p, purpose centroid) come back science.sqrt(Math.pow((centroid.getY() - p.getY()), 2) + Math.pow((centroid.getX() - p.getX()), 2));
}
  
//Creates random purpose
protected static purpose createRandomPoint(int min, int max) come back new Point(x,y);
}
  
protected static List createRandomPoints(int min, int max, int number)
   come points;
}
  
public String toString() come "("+x+","+y+")";
}
}

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public category purpose {

personal double x = 0;
personal double y = 0;
personal int cluster_number = 0;

public Point(double x, double y)
  
  
public void setX(double x)
  
public double getX() come back this.x;
}
  
public void setY(double y)
  
public double getY() come back this.y;
}
  
public void setCluster(int n)
  
public int getCluster() come back this.cluster_number;
}
  
//Calculates the space between 2 points.
protected static double distance(Point p, purpose centroid) come back science.sqrt(Math.pow((centroid.getY() - p.getY()), 2) + Math.pow((centroid.getX() - p.getX()), 2));
}
  
//Creates random purpose
protected static purpose createRandomPoint(int min, int max) come back new Point(x,y);
}
  
protected static List createRandomPoints(int min, int max, int number)
   come points;
}
  
public String toString() come "("+x+","+y+")";
}
}


Related Solutions

Consider the dataset between a quantitative input variable, ? and a quantitative response (output) variable, ?...
Consider the dataset between a quantitative input variable, ? and a quantitative response (output) variable, ? . Which of the following provides an optimal fit between them - a linear model, a complete quadratic model or a complete third order model? (Hint: You can use adjusted multiple coefficient of determination, ?2 to determine the optimal? model. Your answers below must be accompanied by appropriate computation in Excel)?2 value for the linear model = ________________ ?2 value for the quadratic model...
Describe how we can use Markov analysis to make future predictions. 200-250 word response
Describe how we can use Markov analysis to make future predictions. 200-250 word response
Find the regression equation using the following set of data with y as the response variable....
Find the regression equation using the following set of data with y as the response variable. x y 40.2 82.2 54.2 111.8 43 84.3 30.7 68.5 33 90.8 42.8 78.5 30.9 71.7 28.6 69.8 36.6 83.1 41.1 93.9 26.6 63.9 45.5 95.5 What is the correlation coefficient? use three decimal places. r =   What is the regression line equation. Use each value to three decimal places. ˆyy^ =  +  x What is the predicted value of the response variable, when using a...
The ability of models to make accurate predictions is limited by the scale of the system...
The ability of models to make accurate predictions is limited by the scale of the system to which the model is applied compared to the scale for which the model was developed. This means that predictions about plant stomate responses to water stress in a greenhouse cannot apply to a clone of the same plant growing in a massive ecosystem.   Is the above statement true or false? If false, explain why.
R Problem Set: #Work with the inbuilt dataset "Cars" View(cars) This will show you the dataset...
R Problem Set: #Work with the inbuilt dataset "Cars" View(cars) This will show you the dataset on 2 variables speed and distance. ?cars This will explain what the variables mean. #Q1) Describe the dataset. What are the main findings? #Q2) Design a relevant question to model using linear regressions #Q3) Run the regression and report the std error, t-stat, p value and f stat. #Q4) Is this a valid regression? Is the normality assumption justified? Show clearly. #Q5) Are there...
Identify the set of possible values for each random variable. (Make a reasonable estimate based on...
Identify the set of possible values for each random variable. (Make a reasonable estimate based on experience, where necessary, it's not the numbers but the concept that is important.) a. The number of heads in two tosses of a coin. b. The average weight of newborn babies born in a particular county one month. c. The amount of liquid in a 12-ounce can of soft drink. d. The number of games in the next World Series (best of up to...
A political pollster is conducting an analysis of sample results in order to make predictions on...
A political pollster is conducting an analysis of sample results in order to make predictions on election night. Assuming a? two-candidate election, if a specific candidate receives at least 55?% of the vote in the? sample, that candidate will be forecast as the winner of the election. You select a random sample of 100 voters. Complete parts? (a) through? (c) below. a. What is the probability that a candidate will be forecast as the winner when the population percentage of...
A political pollster is conducting an analysis of sample results in order to make predictions on...
A political pollster is conducting an analysis of sample results in order to make predictions on election night. Assuming a​ two-candidate election, if a specific candidate receives at least 54% of the vote in the​ sample, that candidate will be forecast as the winner of the election. You select a random sample of 100 voters. Complete parts​ (a) through​ (c) below. a. The probability is ______ that a candidate will be forecast as the winner when the population percentage of...
What is the role of the exposure variable in relation to the response or outcome variable?
What is the role of the exposure variable in relation to the response or outcome variable?
Solve the problem. A regression equation can be used to make predictions of the y value...
Solve the problem. A regression equation can be used to make predictions of the y value corresponding to a particular x value. Determine whether the following statement is true or false: The 95% confidence interval for the mean of all values of y for which x = x0 will be wider than the 95% confidence interval for a single y for which x = x0.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT