Question

In: Statistics and Probability

In SAS: Create a scatterplot of MaxSalary versus Score from the dataset found at the link...

In SAS: Create a scatterplot of MaxSalary versus Score from the dataset found at the link below.

'salarygov' found here: https://drive.google.com/file/d/1JKwW0byCWK54OPpNdSiOmYkACMbdyClo/view?usp=sharing

Question 1 options (Select all that apply):

The scatterplot shows underlying group effects existing.
The scatterplot appears curved.
The scatterplot appears to have differing levels of variation depending on the value of score.
Simple linear regression is an appropriate choice for this data based on the scatterplot.

Could you post the code that you used and/or the steps used to generate this through tasks and utilities?

Solutions

Expert Solution

1. Scatter Plot of Y and X1

Scatter plot of sales and calls shows that there can be a linear trend between the both. The trendline indicates that it looks like higher the number of calls, higher will be sales

2. Best fit line

Using the Regression option in Excel Data analysis menu, we obtaain the following output

From this, bestfit line equation is Sales=Intercept+Coefficient of Calls *Calls

i.e., Sales = 22.52 + 0.1237 * Calls

3. Coefficient of Correlation

It denotes the strength of association between two variables. The sign denotes the direction of association.

In Exel, we calculate Correlation coefficient as Correl(X1array,Yarray)

We get the value as 0.318

This means that calls and sales are slightly positively associated. With increase in one quantiity, the other is also showing an increasing trend. Please note that this does not imply causation, i.e.,we CANNOT say that the rise or fall in one is causing the change in other.

4. Coefficient of Determination

It is more commonly known as R squared value. It gives the measure of how close the data points are to the best fit line. In other words, it gives the proportion of variability in dependent variable that can be explained by the independent variable. Higher the Rsquared value, better the model is.

From Excel regression output, we get R squared value or Coefficient of Determination as 0.101

~10% of variability in sales is explained by calls.

5. Utility of Regression model

F test can be used to test the utility of the model.

Null Hypothesis: Beta coefficient of call = 0; i.e., Calls is NOT linearly associated with sales

Alternate Hypothesis: Beta coefficient of call 0; Calls is linearly associated with sales

Let us choose significance level, = 0.05.

From the regression ANOVA output, we get p value (or significance value) of F test as 0.0012 (<0.05) for the given degrees of freedom (highlighted)

Since p value < , we can reject Null hypothesis, thereby concluding that with the given data it can be said that calls is linearly associated with sales.

6. Based on the above findings, it can be said that calls is a good and important variable in predicting sales volume. It has been proved that calls and sales have a positive linear association between them. From the best fit line (Sales = 22.52 + 0.1237 * Calls), we can say that with every call, sales increases by 0.1237units (interpretation of coefficient of calls).

7. 95% Confidence Interval

The 95% confidence interval for the coefficient of Calls (1) is [0.0498, 0.1976]

Interpretation: 95% confidence interval means that if this regression analysis is to be repeated for other samples from population, 95% of the intervals will contain the true value of 1. In simpler terms, we can say that we are 95% confident that the true value of 1 is in our interval.

8. Sales = 22.52 + 0.1237 * Calls

Let us say calls = 100.

95% confidence interval for 1 = [0.0498, 0.1976]

Thus, lower limit of Sales value, Ylow = 22.52 + 0.0498 * 100 = 27.5

Upper limit of Sales value, Yhigh = 22.52 + 0.1976 * 100 = 42.28

Thus, for calls = 100, Sales can be expected to be in the range of [27.5, 42.28]


Related Solutions

Data from 5.7: Problem Set 5.7: Scatterplot in SPSS Criterion: Create a scatterplot using SPSS. Data:...
Data from 5.7: Problem Set 5.7: Scatterplot in SPSS Criterion: Create a scatterplot using SPSS. Data: Dr T wanted to see if mindfulness training is related to stress levels. He recruited 10 participants to undergo mindfulness training and examined their reported stress levels on the DRT Stress Inventory. These were his results: Minutes of Mindfulness Training DRT Stress Inventory Score 61 8 122 6 201 3 133 9 55 7 329 1 10 9 1 10 93 7 144 5...
a. In the region to the right, produce a scatterplot of the height versus footlength data...
a. In the region to the right, produce a scatterplot of the height versus footlength data (remember this means footlength runs along the horizontal axis as the independent variable and height along the vertical axis as the dependent variable). Based upon your scatterplot, briefly discuss below your thoughts on whether the “visual” trend between the individuals’ footlength and height appears linear, curvilinear, or has no general trend at all.   b. Complete the following: i. Include the trend line's graph and...
Submit a processed dataset and Python or SAS script that has been used along with a...
Submit a processed dataset and Python or SAS script that has been used along with a short description of the steps you have been following.
Would you please demonstrate to me how to create dataset A and dataset B, where dataset...
Would you please demonstrate to me how to create dataset A and dataset B, where dataset A has a larger range but smaller standard deviation than dataset B. Then the reverse where data set A has a smaller range and larger standard deviation than data set B.
Use the Excel directions below to create a scatterplot for the Temperature in Fahrenheit and the...
Use the Excel directions below to create a scatterplot for the Temperature in Fahrenheit and the number of cricket chirps per sec. Temp 88.6 71.6 92.3 85.3 80.6 75.2 69.7 81 69.4 83.3 79.6 82.6 80.6 (x) Chirps 20.5 16 19.8 18.4 17.1 15.5 14.7 17.1 15.4 16.2 15 17.2 16 (y) 3.) Find the linear regression equation using Excel. Slope uses the equation =slope(highlight y column, highlight x column). Y-intercept uses the equation =intercept(highlight y column, highlight x column)....
Use the Excel directions below to create a scatterplot for the Temperature in Fahrenheit and the...
Use the Excel directions below to create a scatterplot for the Temperature in Fahrenheit and the number of cricket chirps per sec. Temp 88.6 71.6 92.3 85.3 80.6 75.2 69.7 81 69.4 83.3 79.6 82.6 80.6 (x) Chirps 20.5 16 19.8 18.4 17.1 15.5 14.7 17.1 15.4 16.2 15 17.2 16 (y) 3.) Find the linear regression equation using Excel. Slope uses the equation =slope(highlight y column, highlight x column). Y-intercept uses the equation =intercept(highlight y column, highlight x column)....
a link between insulin and neurodegenerative diseases has been found. describe the link between insulin and...
a link between insulin and neurodegenerative diseases has been found. describe the link between insulin and a neurodegenerative disease of your choice? (100 -150 words)
Answer IN R CODE to get the following. Using the data below, Create a scatterplot of...
Answer IN R CODE to get the following. Using the data below, Create a scatterplot of y vs x Fit a simple linear regression model using y as the response and plot the regression line (with the data) Test whether x is a significant predictor and create a 95% CI around the slope coefficient. Report and interpret the coefficient of determination. For x=20, create a CI for E(Y|X=20). For x=150, can you use the model to estimate E(Y|X=150)? Discuss. Does...
From the SAS data set myclass.fueldata, create - a horizontal bar chart showing the average highway MPG for each class;
From the SAS data set myclass.fueldata, create - a horizontal bar chart showing the average highway MPG for each class; - a vertical bar chart showing the average highway MPG for different fuel types; - a histogram showing the distribution of highway MPG for each class; Add a title to each chart and label axes properly. 
Create a Linux program that asks two scores from the user, specifically the raw score as...
Create a Linux program that asks two scores from the user, specifically the raw score as the first score and the total score as the second score. Store the digits in the EAX and EBX register, respectively. Given the formula raw score divided by total score times 50 and plus 50. Store the result in memory location ‘res’ and finally display the result.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT