Question

In: Computer Science

Dataset The scikit-learn sklearn.datasets module includes some small datasets for experimentation. In this project we will...

Dataset

The scikit-learn sklearn.datasets module includes some small datasets for experimentation. In this project we will use the Boston house prices dataset to try and predict the median value of a home given several features of its neighborhood.

See the section on scikit-learn in Sergiy Kolesnikov’s blog article Datasets in Python to see how to load this dataset and examine it using pandas DataFrames.

Reminder: while you will use scikit-learn to obtain the dataset, your linear regression implementation must use NumPy directly.

Experiments

Run the following experiments in a Jupyter notebook, performing each action in a code cell and answering each question in a Markdown cell.

  1. Load and examine the Boston dataset’s features, target values, and description.

  2. Use sklearn.model_selection.train_test_split() to split the features and values into separate training and test sets. Use 80% of the original data as a training set, and 20% for testing.

  3. Create a scatterplot of the training set showing the relationship between the feature LSTAT and the target value MEDV. Does the relationship appear to be linear?

  4. With LSTAT as X and MEDV as t, use np.linalg.inv() to compute w for the training set. What is the equation for MEDV as a linear function of LSTAT?

  5. Use w to add a line to your scatter plot from experiment (3). How well does the model appear to fit the training set?

  6. Use w to find the response for each value of the LSTAT attribute in the test set, then compute the test MSE ? for the model.

  7. Now add an x2 column to LSTAT’s x column in the training set, then repeat experiments (4), (5), and (6) for MEDV as a quadratic function of LSTAT. Does the quadratic polynomial do a better job of predicting the values in the test set?

  8. Repeat experiment (4) with all 13 input features as X and using np.linalg.solve(). (See the Appendix to Linear regression in vector and matrix format for details.) Does adding additional features improve the performance on the test set compared to using only LSTAT?

  9. Now add x2 columns for all 13 features, and repeat experiment (8). Does adding quadratic features improve the performance on the test set compared to using only linear features?

  10. Compute the training MSE for experiments (8) and (9) and compare it to the test MSE. What explains the difference?

  11. Repeat experiments (9) and (10), adding x3 columns in addition to the existing x and x2 columns for each feature. Does the cubic polynomial do a better job of predicting the values in the training set? Does it do a better job of predicting the values in the test set?

Solutions

Expert Solution

Answer :

I hope this answer is helpful to you, If you like the answer Please Upvote(Thums Up).I'm need of it, Thank you.


Related Solutions

We learn in this module about the impacts that organizational culture and organizational structure have on...
We learn in this module about the impacts that organizational culture and organizational structure have on outcomes that are important at the organizational, team, and individual levels—and internationally. Discuss how organizational culture affects the workplace, and how it shapes an organization ethically. Compare and contrast how organizational culture affects the global workplace. Consider an organization with which you are, or have been, affiliated (whether professional, military, religious, civic, or hobbyist, for example): Describe how the concepts of culture and structure...
This dataset includes the number of work hours for each project, the function point count for...
This dataset includes the number of work hours for each project, the function point count for each project, and identifiers for operating system, data management system, and programming language utilized. Open the dataset pointworkload.csv in Excel. Create a new column that calculates the number of work hours per function point for each project. FunctionPointCount WorkHours OS DMS Language 1059 15000 1 5 1 234 1850 1 5 1 1533 13033 1 5 1 339 11742 1 2 1 205 283...
Assume we have a dataset that includes 60 observations surrounding two variables of interest: (1) Soybean...
Assume we have a dataset that includes 60 observations surrounding two variables of interest: (1) Soybean yields in bushels per acre (bu/acre) and (2) fertilizer treatment. Variable (1) is quantitative while variable (2) is categorical; assume that there were four different fertilizer treatments tested. Assume also that the number of observations of each fertilizer treatment was the same for each group; i.e., 15 observations of each fertilizer treatment were collected. Write out the “Generic” null hypothesis. Write out the “Specific”...
Project suggestion which includes Microcontrollers and PCB and some sensors
Project suggestion which includes Microcontrollers and PCB and some sensors
Consider the table below, which includes time and activity data for a Small IT Project. All...
Consider the table below, which includes time and activity data for a Small IT Project. All duration estimates or estimated times are in days. Perform a critical path analysis for this project. Activity Duration (Days) Predecessor A 8 - B 10 A C 18 A D 4 A E 28 B, D F 2 C, E G 32 E H 18 B I 12 F J 2 H, G, I Draw a network diagram representing the project. MS PowerPoint or...
For projects that are too small to require project management, what are some project management techniques...
For projects that are too small to require project management, what are some project management techniques that could still be used? Write your plan and list the tasks, durations, and due dates for the week.
In what ways could we use international project management to learn and adopt ideas for nonprofit...
In what ways could we use international project management to learn and adopt ideas for nonprofit efforts in communities throughout the United States?
This week, we learn all about physical environments for infants and toddlers. What are some challenges...
This week, we learn all about physical environments for infants and toddlers. What are some challenges that can arise when designing a developmentally appropriate space for infants and toddlers? How might caregivers overcome such obstacles?  What advice can you offer your peers via Discussion Board this week?
There are many differences among small businesses. Discuss some of the differences we might expect to...
There are many differences among small businesses. Discuss some of the differences we might expect to see among VSEs, SEs and MSEs.
Many famous figures have been Autodidacts. What can we learn from Autodidacts? Give some examples? 500...
Many famous figures have been Autodidacts. What can we learn from Autodidacts? Give some examples? 500 Words
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT