In: Statistics and Probability
1. What is the dependent and independent variable?
2. What conclusion about the relationship can be made from the scatter diagram?
3. What is the meaning of the slope and y-intercept?
4. What is the meaning of the coefficient of determination? When is it adjusted?
5. What is the goal of the t test?
6. The goal of the residual analysis?
7. How the three variations are calculated? What are the names and formulas used?
8. What is the difference between a simple and multiple regression?
9. How an interaction is added to the model? Why?
10. How can the significance of an added term to model can be tested?
1. What is the dependent and independent variable?
The dependent variable is the outcome, the effect, the criterion that you are trying to measure.The change that happens because of the independent variable. (effect)
An independent variable is defines as the variable that is changed or controlled in a scientific experiment. It represents the cause or reason for an outcome.
2. What conclusion about the relationship can be made from the scatter diagram?
A scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for two different variables - one plotted along the x-axis and the other plotted along the y-axis.
Scatter plots are used when you want to show the relationship between two variables. Scatter plots are sometimes called correlation plots because they show how two variables are correlated.
Conclusion:
Scatter plots are very useful tools for conveying the relationship between two variables
3. What is the meaning of the slope and y-intercept?
Every straight line can be represented by an equation: y = mx + b. ... The equation of any straight line, called a linear equation, can be written as: y = mx + b, where m is the slope of the line and b is the y-intercept. The y-intercept of this line is the value of y at the point where the line crosses the y axis
4. What is the meaning of the coefficient of determination? When is it adjusted?
The coefficient of determination, R-square, is used to analyze how differences in one variable can be explained by a difference in a second variable.
The Adjusted Coefficient of Determination (Adjusted R-squared) is an adjustment for the Coefficient of Determination that takes into account the number of variables in a data set. It also penalizes you for points that don’t fit the model.
5. What is the goal of the t test?
A t-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features.
A t-test looks at the t-statistic, the t-distribution values and
the degrees of freedom to determine the probability of difference
between two sets of data.
6. The goal of the residual analysis?
A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.
7. How the three variations are calculated? What are the names and formulas used?
In statistical terms continuous variables are described by a mean and measures of variation. To describe the variation, standard deviation, variance and coefficient of variation can be used.
Mean Formula :
SD formula :
Coefficient of Varation formula :
8. What is the difference between a simple and multiple regression?
Multiple linear regression : two or more independent variables are used to predict the value of a dependent variable. The difference between the two is the numberof independent variables. Y=A+Bx is a Simple linear regression. There is one only one predictor(X).
9. How an interaction is added to the model? Why?
Adding interaction terms to a regression model can greatly expand understanding of the relationships among the variables in the model and allows more hypotheses to be tested.
let’s say we have the following linear regression:
y=a0+a1x1+a2x2y
where y is the target, aiai is the regression coefficient and xi is your data values, then a linear regression with interactions is as follows:
y=a0+a1x1+a2x2+a12x1x2y
The difference between the first and the second models is the presence of the interaction term x1x2
10. How can the significance of an added term to model can be tested?
Statistical significance is the probability of finding a given deviation from the null hypothesis.Statistical significance is often referred to as the p-value (short for “probability value”)
The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.