Question

In: Computer Science

We obtained a large set of data on daily weather, including date, wind gust speed, sunshine...

We obtained a large set of data on daily weather, including date, wind gust speed, sunshine duration, rain or not, temperature, and pressure. With this data, we wish to understand which factors affect whether it will rain or not on the next day.

A.

This scenario describes a classification problem

B.

This scenario describes a regression problem

Suppose that we have a data with 20 potential predictors. We want to run a subset selection procedure to find a single best model. Considering computational complexity, which one of the two algorithms is preferable:

A.

forward stepwise selection

B.

best subset selection

Suppose that we want to compare two models M1 and M2. The AIC (Akaike information criterion) value of M1 is -1005.3. And that of M2 is -1012.6. If we make a selection purely based on AIC, which model is a better one?

A.

M2

B.

M1

We have a data set with 90 observations. If we use this data set to perform a 10-fold cross validation, how many observations are used for training at each iteration?

A.

81

B.

90

C.

9

D.

10

Solutions

Expert Solution

Ans a) We obtained a large set of data on daily weather, including date, wind gust speed, sunshine duration, rain or not, temperature, and pressure. With this data, we wish to understand which factors affect whether it will rain or not on the next day. This scenario describes a classification problem because we are predicting wheather it will rain or not the next day. (A) This scenario describes a classification problem

Ans - b) Forward stepwise selection algorithmis preferable as it is a type of stepwise regression which begins with an empty model and adds in variables one by one that gives the single best improvement to your model. (A) Forward stepwise selection

Ans - c) If we make selection purely on the basis of AIC then model M2 is better as lower value of AIC indicates better fit of the model. (M2)

Ans - d) We have a data set with 90 observations. If we use this data set to perform a 10-fold cross validation, then as we are making 10 folds than there will be 90/10 = 9 observations in each fold and we will train on 9 folds and test on 1 fold. So, there will be 81 observatons for training data at each iteration. (81)


Related Solutions

The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed,...
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed, temperature, and solar radiation by month and day for May through September in New York. Attach airquality to your workspace and then construct side-by-side boxplots of Wind by Month. Month is a numeric variable in the airquality data frame. You can treat it as a factor by using the ”as.factor” function, e.g., > plot(Wind ∼ as.factor(Month)) Next, do an analysis of variance to determine...
The wind chill factor depends on wind speed and air temp. This data represents the wind...
The wind chill factor depends on wind speed and air temp. This data represents the wind speed (in miles per hour) and wind chill factor at an air temp. of 15 degrees F. Wind speed(x) 5, 10, 15, 20, 25, 30, 35 Wind chill(y) 12, -3, -11, -17, -22, -25, -27 Compute the least squares regression line and correlation coefficient for this data. Predict the wind chill for a wind speed of 50 miles per hour. Determine the wind speed...
the following data represent the maximum wind speed (in knots) and atmospheric pressure (in millibars) for...
the following data represent the maximum wind speed (in knots) and atmospheric pressure (in millibars) for a random sample of 20 hurricanes that originated over the Atlantic Ocean in the last decade. A table of summary statistics for this data (as calculated in Homework 9) is also provided below. In Homework 9, we determined the equation of the least-squares regression line: and also calculated the estimated standard error of the model: Use the information given above to complete the following...
Consider a high-speed subsonic wind tunnel. The conditions in the large-diameter section upstream of the test...
Consider a high-speed subsonic wind tunnel. The conditions in the large-diameter section upstream of the test section are V = 228 mph and T = 540°R. At the test section, the temperature is 473°R and the pressure is 2 atm. (a) If a wind tunnel model, placed in the test section, has a wing chord of 12 in., what is the test Reynolds number based on that chord? (b) What is the overall smooth flat-plate skin friction coefficient of the...
A large data set is separated into a training set and a test set. (a) Is...
A large data set is separated into a training set and a test set. (a) Is it necessary to do this randomly? Why or why not? (b) In R how might this separation be done in a reproducible way? (c) The statistician chooses 20% of the data for training and 80% for testing. Comment briefly on this—2 or 3 lines would be plenty.
Hurricanes The data below represent the maximum wind speed (in knots) and atmospheric pressure (in millibars)...
Hurricanes The data below represent the maximum wind speed (in knots) and atmospheric pressure (in millibars) for a random sample of hurricanes that originated in the Atlantic Ocean. Atmospheric Pressure (mb) Wind Speed (knots) 993 50 995 60 994 60 997 45 1003 45 1004 40 1000 55 994 55 942 105 1006 30 1006 40 942 120 1002 40 986 50 983 70 994 65 940 120 976 80 966 100 982 55 Source: National Hurricane Center (a) Draw...
The following set of data was obtained by the method of initial rates for the reaction:...
The following set of data was obtained by the method of initial rates for the reaction: 2NO(g) + O2 (g) ----> 2NO2(g) Experiment # [NO], M [O2], M Initial Rate, M/s 1 0.0126 0.0125 1.41 x 10^-2 2 0.0252 0.0250 1.13 x 10^-1 3 0.0252 0.0125 5.64 x 10^-2 a) What is the rate law for the reaction? b) If you triple the concentration of both reactants, what will happen to the rate? c) What is the value of the...
I have the appropriate data set. This is the question: How much variance in maximum wind...
I have the appropriate data set. This is the question: How much variance in maximum wind speed is explained by atmospheric pressure? With a one-unit change in atmospheric pressure, what is the corresponding change in maximum wind speed?   Can you tell me the procedure to use in SPSS in order to answer these questions?
The data set airquality is one of R’s included data sets. It shows daily measurements of...
The data set airquality is one of R’s included data sets. It shows daily measurements of ozone concentration (Ozone), solar radiation (Solar.R), wind speed (Wind), and temperature (Temp) for 5 summer months in 1977 in New York City. Some of the observations are missing and are recorded as NA, meaning not available. View an overall summary of the variables in airquality with the command > summary(airquality) Ignore the summaries for Month and Day since those variables should be factors, not...
The following data show the daily closing prices (in dollars per share) for a stock. Date...
The following data show the daily closing prices (in dollars per share) for a stock. Date Price ($) Nov. 3 83.71 Nov. 4 83.87 Nov. 7 83.40 Nov. 8 83.86 Nov. 9 83.24 Nov. 10 82.90 Nov. 11 84.66 Nov. 14 84.35 Nov. 15 85.74 Nov. 16 86.62 Nov. 17 86.74 Nov. 18 87.93 Nov. 21 87.98 Nov. 22 87.60 Nov. 23 88.22 Nov. 25 88.39 Nov. 28 88.94 Nov. 29 89.72 Nov. 30 89.77 Dec. 1 89.22 a. Define...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT