In: Statistics and Probability

The below provides information about the dataset.

Input variables (based on physicochemical tests) | |

Fixed acidity | Numeric |

Volatile acidity | Numeric |

Citric acid | Numeric |

Residual sugar | Numeric |

Chlorides | Numeric |

Free sulfur dioxide | Numeric |

Total sulfur dioxide | Numeric |

Density | Numeric |

pH | Numeric |

Sulphates | Numeric |

Alcohol | Numeric (%) |

Wine Type | red or white |

Output variable (based on sensory data) | |

Quality | Score between 0 and 10 in ordinal |

*Data set*

fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | winetype | quality |

7.4 | 0.7 | 0 | 1.9 | 0.076 | 11 | 34 | 0.9978 | 3.51 | 0.56 | 9.4 | red | 5 |

7.8 | 0.88 | 0 | 2.6 | 0.098 | 25 | 67 | 0.9968 | 3.2 | 0.68 | 9.8 | red | 5 |

7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15 | 54 | 0.997 | 3.26 | 0.65 | 9.8 | red | 5 |

11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17 | 60 | 0.998 | 3.16 | 0.58 | 9.8 | red | 6 |

7.4 | 0.7 | 0 | 1.9 | 0.076 | 11 | 34 | 0.9978 | 3.51 | 0.56 | 9.4 | red | 5 |

7.4 | 0.66 | 0 | 1.8 | 0.075 | 13 | 40 | 0.9978 | 3.51 | 0.56 | 9.4 | red | 5 |

7.9 | 0.6 | 0.06 | 1.6 | 0.069 | 15 | 59 | 0.9964 | 3.3 | 0.46 | 9.4 | red | 5 |

7.3 | 0.65 | 0 | 1.2 | 0.065 | 15 | 21 | 0.9946 | 3.39 | 0.47 | 10 | red | 7 |

7.8 | 0.58 | 0.02 | 2 | 0.073 | 9 | 18 | 0.9968 | 3.36 | 0.57 | 9.5 | red | 7 |

7.5 | 0.5 | 0.36 | 6.1 | 0.071 | 17 | 102 | 0.9978 | 3.35 | 0.8 | 10.5 | red |
5 |

Using R, build a linear regression model, logistic regression and classification model for wine quality prediction and no data partition is needed.

Which approach is best, regression or classification models? Why?

After entering the data in Excel we can do the analysis in R

R code along with output is attached herewith.

Here from the summary statistics both the model looks to be fine but the response variable is a ordinal varaiable so it is more appropriate to fit a classification model i.e., Logistic regression model.

Because in linear regression model many response values may be greater than 7 or less than 5

and we cannot interpret them on the other hand if they are categorical then it will be classified to one of the classes.

Consider the dataset regarding the drop in light percent output
(output) and two input variables – bulb surface (qualitative) and
length of the operation hours (quantitative). The data is available
in a sheet named ‘Problem 3’. Answer the following.
(a) Write down the overall model form if one wishes to build a
second order model for each value of the qualitative variable
(c) Build a regression model showing the 90% confidence ranges
of the regression parameters. Write down the mean...

Consider a relational dataset and
specify your input and output variables, then:
Train the model using 80% of this dataset and suggest an
appropriate GLM model output to
input variables.
Specify the significant variables on the
output variable at the level of ?=0.05 and explore
the related hypotheses test. Estimate the parameters of your
model.
Predict the output of the test dataset using the trained model.
Provide the functional form of the optimal predictive model.
Provide the confusion matrix and...

The table below provides information about price and output
produced in country X for the years 2010 and 2015. The base
year is 2010.
p2010
y2010
p2015
y2015
Clothing
20
100
20
150
Food
10
50
12
60
Government pension
payments
75
500
100
750
Military
equipment
250
750
250
800
Using the GDP deflator, determine whether inflation or
deflation occurred during the period. Enter the value of the GDP
deflator and...

The table below provides some information about the possible
outcomes of a particular study. From the information provided
determine the effect size and power of each scenario. Sketch the
distributions involved and show the areas representing alpha, beta,
and power for each scenario. You can assume that all the
populations are normally distributed.
Hint 1: For scenario 1, the power is 80.23%. For scenario 2 the
power is 99.63. So be sure to show your work for the rest of...

Information about an association between two interval-ratio
variables is presented below. The association is between “the hours
of screen time per day” (Y) and “years of schooling” (X). A measure
of the overall association is given as well as the specific
components of the OLS model. The OLS model estimates the effect of
education (X) on the hours of screen time per day (Y).
Association Between x and y Estimate
r -0.229
Rsqrd
OLS Model components Estimate
Constant (a)...

The table below provides information about three firms (A, B,
C): CO2 Pollution produced by each firm and costs to clean up
pollution (by unit).
Firm
Pollution produced by firm
Costs to clean up pollution (by unit)
A
70 units
$20
B
80 units
$25
C
50 units
$10
The goal is to reduce pollution to 120 units (by three firms).
That is, each firm is allowed to produce 40 units of pollution.
Each firm is given 40 units tradable...

The following table provides incomplete information on the costs
of producing diagnostic imaging tests. It uses two inputs to
produce its outputs: capital and labour. The fixed capital costs
reflect the monthly leasing cost for the diagnostic imaging machine
and the variable labour costs reflects the wages and hours worked
by staff.
Assume that the firm is operating in a perfectly competitive
market. Also assume that its fixed costs are sunk costs (i.e. it
cannot recuperate these costs even if...

A CVP analysis focuses on sales profit, variables and fixed
costs to make assumptions based on the break-even point and support
the managers in a short-run decision. In my previous job, I worked
in the purchasing department, we had to consider the sales volume,
to negotiate volume and variables costs such as exchange rates to
evaluate what our saving would be against the raw materials and
producing internally.
Did you use CVP analysis at your company? If so, how well...

The table below provides information about the product Microsoft
Game.
Microsoft Game
Sale price per piece (kr / pc) 150.00
Variable cost per piece:
Direct material (kr / pc) 27.00
Direct wages (kr / pcs) 7.50
Indirect cost (ISK / pc) 7.50
Sales and management costs (kr / pc) 3.00
Fixed costs per year:
Indirect cost 600,000
Sales and management costs 135,000
1. Calculate contribution margin per unit
2. Set up a statement of income in the form of a...

the table below provides information for a probability
distribution. use the table below to answer the following
questions.
X p(X)
0 .10
1 .60
2 .30
a. calculate the variance.
b. calculate the standard deviation

ADVERTISEMENT

ADVERTISEMENT

Latest Questions

- Gelbart Company manufactures gas grills. Fixed costs amount to $25,428,000 per year. Variable costs per gas...
- A personal trainer is interested in the typical physical activity of her clients. She asked a...
- [The following information applies to the questions displayed below.] Falcon Crest Aces (FCA), Inc., is considering...
- A study released by the governor’s office stated that 60% of the state residents support a...
- Give an example of a competing priority when the good of society is favored over the...
- do you feel that we have a good system with all different kinds of currency around...
- Linear algebra Thank you. Extra Credit (No partial credit - 1%): It is often useful to...

ADVERTISEMENT