In: Statistics and Probability
In this project, you will collect data from real world to construct a multiple regression model. The resulting model will be used for a prediction purpose. For example, suppose you are interested in “sales price of houses”. In a multiple regression model, this is called a “response variable”. There are many important factors that affect the prices of houses.
Those factors include size (square feet), number of bedrooms, number of baths, age of the house, distance to a major grocery store. The factors (or variables) which are used for a multiple regression model are called “explanatory variables” (or “independent variables”). Good choice of explanatory variables is one of the most important steps to construct a good multiple regression model. www.zillow.com, One of the most recognized realtor website in United States, provides predicted prices (“zestimate”) of houses. Now the goal of the project is to construct your own prediction model of house prices. The first step of the project is to decide which explanatory variables you will use. In this project, please find at least four explanatory variables.
Next step is data collection. You are required to collect at least 100 observations (samples). Otherwise, you will not get full credits. Each observation must include sales value and all the values of explanatory variables of your choice. For example, if your explanatory variables are size, number of beds, number of baths, and age of houses, then the data set must be of the following form
The dependent variable is sales price of house
The independent variables are
Sales 169277.0525 187758.394 183583.6836 179317.4775 150730.08 177150.9892 172070.6592 175110.9565 162011.6988 160726.2478 157933.2795 145291.245 159672.0176 164167.5183 150891.6382 179460.9652 185034.6289 182352.1926 183053.4582 187823.3393 186544.1143 158230.7752 190552.8293 147183.6749 185855.3009 174350.4707 201740.6207 162986.3789 162330.1991 165845.9386 180929.6229 163481.5015 187798.0767 198822.1989 194868.4099 152605.2986 147797.7028 150521.969 146991.6302 150306.3078 151164.3725 151133.707 156214.0425 171992.7607 173214.9125 192429.1873 190878.6951 194542.5441 191849.4391 176363.7739 176954.1854 176521.217 179436.7048 220079.7568 175502.9181 188321.0738 163276.3245 185911.3663 171392.831 174418.207 179682.7096 179423.7516 171756.9181 166849.6382 181122.1687 170934.4627 159738.2926 174445.7596 174706.3637 164507.6725 163602.5122 154126.2702 171104.8535 167735.3927 183003.6133 172580.3812 165407.8891 176363.7739 175182.9509 190757.1778 167186.9958 167839.3768 173912.4212 154034.9174 156002.9558 168173.9433 168882.4371 168173.9433 157580.1776 181922.1526 155134.2278 188885.5733 183963.193 161298.7623 188613.6676 175080.1118 174744.4003 168175.9113 182333.4726 158307.2067 |
MSSubClass 20 20 60 60 120 60 20 60 20 20 120 160 160 160 120 60 20 20 20 20 60 120 20 120 80 60 60 20 20 20 60 30 20 60 60 120 160 160 160 160 160 160 20 60 20 20 60 50 60 20 20 20 80 90 50 50 85 90 20 20 20 20 50 20 20 190 30 50 20 20 50 30 20 45 50 50 30 70 70 190 70 50 75 30 50 50 50 50 50 50 30 50 70 70 20 50 190 50 70 80 |
LotArea 11622 14267 13830 9978 5005 10000 7980 8402 10176 8400 5858 1680 1680 2280 2280 12858 12883 11520 14122 14300 13650 7132 18494 3203 13300 8577 17433 8987 9215 10440 11920 9800 15410 13143 11134 4835 3515 3215 2544 2544 2980 2403 12853 7379 8000 10456 10791 18837 9600 9600 9900 9680 10600 13260 9724 17360 11380 8267 8197 8050 10725 10032 8382 10950 10895 13587 7898 8064 7635 9760 4800 4485 5805 6900 11851 8239 9656 9600 9000 9045 10560 5830 7793 5000 6000 6000 6360 6000 6240 6240 6120 8094 12900 3068 15263 10632 9900 6001 6449 7556 |
LotFrontage 80 81 74 78 43 75 NA 63 85 70 26 21 21 24 24 102 94 90 79 110 105 41 100 43 67 63 60 73 92 84 70 70 39 85 88 25 39 30 24 24 NA NA 57 68 80 NA 80 NA 80 80 90 88 NA 98 68 120 75 70 70 NA 87 80 60 60 119 70 65 60 81 80 60 56 69 50 69 NA 68 60 50 100 60 53 NA 50 50 50 53 50 52 52 51 57 60 52 100 72 60 65 NA 66 |
LandSlope Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Mod Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Mod Gtl Gtl Mod Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Gtl Mod Gtl mod |
Condition1 Feedr Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm PosN Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Feedr RRNe Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Norm Artery Norm Feedr Norm Norm Norm Norm Norm Norm Norm Norm Norm Artery Norm Norm Norm Artery Norm Norm Artery Artery Norm Norm Norm Norm Norm Feedr Norm Feedr Norm Norm Feedr Norm Norm Norm Artery Norm Norm Norm Feedr Norm Norm Norm Norm norm |