Question

In: Statistics and Probability

The below provides information about the dataset. Input variables (based on physicochemical tests) Fixed acidity Numeric...

The below provides information about the dataset.

Input variables (based on physicochemical tests)
Fixed acidity Numeric
Volatile acidity Numeric
Citric acid Numeric
Residual sugar Numeric
Chlorides Numeric
Free sulfur dioxide Numeric
Total sulfur dioxide Numeric
Density Numeric
pH Numeric
Sulphates Numeric
Alcohol Numeric (%)
Wine Type red or white
Output variable (based on sensory data)
Quality Score between 0 and 10 in ordinal

Data set

fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol winetype quality
7.4 0.7 0 1.9 0.076 11 34 0.9978 3.51 0.56 9.4 red 5
7.8 0.88 0 2.6 0.098 25 67 0.9968 3.2 0.68 9.8 red 5
7.8 0.76 0.04 2.3 0.092 15 54 0.997 3.26 0.65 9.8 red 5
11.2 0.28 0.56 1.9 0.075 17 60 0.998 3.16 0.58 9.8 red 6
7.4 0.7 0 1.9 0.076 11 34 0.9978 3.51 0.56 9.4 red 5
7.4 0.66 0 1.8 0.075 13 40 0.9978 3.51 0.56 9.4 red 5
7.9 0.6 0.06 1.6 0.069 15 59 0.9964 3.3 0.46 9.4 red 5
7.3 0.65 0 1.2 0.065 15 21 0.9946 3.39 0.47 10 red 7
7.8 0.58 0.02 2 0.073 9 18 0.9968 3.36 0.57 9.5 red 7
7.5 0.5 0.36 6.1 0.071 17 102 0.9978 3.35 0.8 10.5 red

5

Using R, build a linear regression model, logistic regression and classification model for wine quality prediction and no data partition is needed.

Which approach is best, regression or classification models? Why?

Solutions

Expert Solution

After entering the data in Excel we can do the analysis in R

R code along with output is attached herewith.

Here from the summary statistics both the model looks to be fine but the response variable is a ordinal varaiable so it is more appropriate to fit a classification model i.e., Logistic regression model.

Because in linear regression model many response values may be greater than 7 or less than 5

and we cannot interpret them on the other hand if they are categorical then it will be classified to one of the classes.


Related Solutions

Consider the dataset regarding the drop in light percent output (output) and two input variables –...
Consider the dataset regarding the drop in light percent output (output) and two input variables – bulb surface (qualitative) and length of the operation hours (quantitative). The data is available in a sheet named ‘Problem 3’. Answer the following. (a) Write down the overall model form if one wishes to build a second order model for each value of the qualitative variable (c) Build a regression model showing the 90% confidence ranges of the regression parameters. Write down the mean...
Consider a relational dataset and specify your input and output variables, then: Train the model using...
Consider a relational dataset and specify your input and output variables, then: Train the model using 80% of this dataset and suggest an appropriate GLM model output to input variables. Specify the significant variables on the output variable at the level of ?=0.05 and explore the related hypotheses test. Estimate the parameters of your model. Predict the output of the test dataset using the trained model. Provide the functional form of the optimal predictive model. Provide the confusion matrix and...
1. The table below provides information about the cost of inputs and value of output for...
1. The table below provides information about the cost of inputs and value of output for the production of a road bike. Note there are four different stages of production. Raw materials Manufacturing Construction Sale by the retailer Rubber for one tire ($20) Tire maker sells tires for $30 each Bike mechanic puts everything together and sells the bike for $345 Retailer sells the bike for $500 Aluminum for the frame ($80) Other component materials ($70) Frame maker sells bike...
The table below provides information about price and output produced in country X for the years...
The table below provides information about price and output produced in country X for the years 2010 and 2015. The base year is 2010.                                                             p2010              y2010              p2015             y2015 Clothing                                              20                    100                  20                    150 Food                                                   10                    50                    12                    60 Government pension payments        75                    500                  100                  750 Military equipment                            250                  750                  250                  800 Using the GDP deflator, determine whether inflation or deflation occurred during the period. Enter the value of the GDP deflator and...
The table below provides some information about the possible outcomes of a particular study. From the...
The table below provides some information about the possible outcomes of a particular study. From the information provided determine the effect size and power of each scenario. Sketch the distributions involved and show the areas representing alpha, beta, and power for each scenario. You can assume that all the populations are normally distributed. Hint 1: For scenario 1, the power is 80.23%. For scenario 2 the power is 99.63. So be sure to show your work for the rest of...
Information about an association between two interval-ratio variables is presented below. The association is between “the...
Information about an association between two interval-ratio variables is presented below. The association is between “the hours of screen time per day” (Y) and “years of schooling” (X). A measure of the overall association is given as well as the specific components of the OLS model. The OLS model estimates the effect of education (X) on the hours of screen time per day (Y). Association Between x and y Estimate r    -0.229 Rsqrd OLS Model components Estimate Constant (a)...
The table below provides information about three firms (A, B, C): CO2 Pollution produced by each...
The table below provides information about three firms (A, B, C): CO2 Pollution produced by each firm and costs to clean up pollution (by unit). Firm Pollution produced by firm Costs to clean up pollution (by unit) A 70 units $20 B 80 units $25 C 50 units $10 The goal is to reduce pollution to 120 units (by three firms). That is, each firm is allowed to produce 40 units of pollution. Each firm is given 40 units tradable...
The following table provides incomplete information on the costs of producing diagnostic imaging tests. It uses...
The following table provides incomplete information on the costs of producing diagnostic imaging tests. It uses two inputs to produce its outputs: capital and labour. The fixed capital costs reflect the monthly leasing cost for the diagnostic imaging machine and the variable labour costs reflects the wages and hours worked by staff. Assume that the firm is operating in a perfectly competitive market. Also assume that its fixed costs are sunk costs (i.e. it cannot recuperate these costs even if...
A CVP analysis focuses on sales profit, variables and fixed costs to make assumptions based on...
A CVP analysis focuses on sales profit, variables and fixed costs to make assumptions based on the break-even point and support the managers in a short-run decision. In my previous job, I worked in the purchasing department, we had to consider the sales volume, to negotiate volume and variables costs such as exchange rates to evaluate what our saving would be against the raw materials and producing internally. Did you use CVP analysis at your company? If so, how well...
The table below provides information about the product Microsoft Game. Microsoft Game Sale price per piece...
The table below provides information about the product Microsoft Game. Microsoft Game Sale price per piece (kr / pc) 150.00 Variable cost per piece: Direct material (kr / pc) 27.00 Direct wages (kr / pcs) 7.50 Indirect cost (ISK / pc) 7.50 Sales and management costs (kr / pc) 3.00 Fixed costs per year: Indirect cost 600,000 Sales and management costs 135,000 1. Calculate contribution margin per unit 2. Set up a statement of income in the form of a...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT