Question

In: Statistics and Probability

Question 3 Using R for calculations In a competitive company, the income levels of its employees...

Question 3 Using R for calculations

In a competitive company, the income levels of its employees are not standardised and are awarded on a case-by-case basis after negotiations between individual employees and the company directors. An industry body wants an analysis of employee income with respect to their relative rank in the company. A random sample of 184 individuals in the company were recruited and their relative Rank and Income values were anonymously recorded.

The data is available below. The variables are defined below.

Income - The yearly income of the employee.

Rank - The relative rank of the employee’s position at the company (1 being highest rank, 9 being lowest)

a. Fit a simple linear regression to the data with Income as the response variable and worker Rank as the predictor.

b. Using your analysis in a. or otherwise, explain why simple linear regression is inadequate to explain the structure in this dataset.

c. Fit a polynomial regression model to the data and select the best order polynomial to explain the data using the significance testing techniques discussed in lectures.

d. Predict the Income for a person planning to apply for a position at Rank 5 at one of these competitive companies in the near future

Rank Income
7 106790
6 70916
9 70495
3 191968
6 59373
6 106390
8 31339
3 235000
3 209008
5 115081
1 510684
1 557015
8 115096
2 311281
8 83348
6 118896
1 523692
3 230699
5 127867
6 103211
5 97534
8 68099
4 72454
6 129781
2 360613
6 73465
9 93146
6 104356
7 42327
5 145520
9 55853
7 77324
3 216965
1 532028
5 120256
6 37870
7 89948
1 511271
4 193372
2 281334
6 83604
8 53887
9 64738
9 72541
4 164709
9 56205
4 181247
5 92034
4 177882
1 483163
6 97319
1 484151
1 492368
6 120574
8 52470
7 46166
5 155870
6 76479
3 218382
8 91030
3 200678
2 364445
6 78075
7 77990
1 530666
6 136092
4 132705
7 120456
6 115115
2 296011
6 64033
1 512753
3 167713
7 60436
7 61206
3 266501
4 227492
1 514100
2 384562
2 271253
1 505753
4 148516
2 338896
9 70202
2 288968
7 116571
9 92788
4 166387
8 84762
6 92757
3 243974
8 44752
2 311745
4 165152
3 216874
4 224083
6 125820
4 196454
9 21565
2 340717
9 48784
5 105917
9 25375
8 103300
6 107669
7 93197
4 154516
8 59497
8 68733
1 540871
1 590015
3 134095
8 87005
7 45888
6 73332
4 217111
9 86037
1 463367
4 202798
4 213355
4 216602
9 35764
8 65762
2 352920
2 279612
2 349812
5 166996
6 107851
5 128139
6 166045
8 47305
9 60798
8 37471
8 10184
1 528574
4 164696
9 25789
5 140320
1 499333
2 336158
6 89999
8 104567
6 143554
5 163795
1 513261
4 165280
4 161781
7 81081
9 41830
9 22884
2 338717
6 89851
6 77929
9 29934
3 205850
5 84776
5 125247
6 80336
1 591938
9 74762
9 53977
5 107757
7 60626
5 111661
4 149466
2 346352
1 534712
6 147205
2 288935
7 96857
4 164486
6 65347
9 36389
9 102282
8 53647
3 263337
6 56293
6 78559
1 550526
9 79542
8 35019
8 133983
5 161509
5 127704

Solutions

Expert Solution

Use below R code :

dim(Employee_Analysis)

184 2

184 rows with 2 columns


names(Employee_Analysis)

"Rank" "Income"
reg.mod <- lm(Employee_Analysis$Income~Employee_Analysis$Rank)
summary(reg.mod)

(Intercept) Employee_Analysis$Rank
438835.04 -50353.65

Regression equation is

income=438835.04 -50353.65*Rank

b. Using your analysis in a. or otherwise, explain why simple linear regression is inadequate to explain the structure in this dataset.

R sq=0.7629

76.29% variation in Income is explained by model.

F(1.182)=585.6

p=0.0000

P<0.05 Model is significant.We can use model for prediction.

Use R code to get residual plot as:

plot(fitted(reg.mod),residuals(reg.mod))

From Residual plot we observe polynomial equation is the best fit

c. Fit a polynomial regression model to the data and select the best order polynomial to explain the data using the significance testing techniques discussed in lectures.

using excel trend line:

Income = 10031*Rank^2 - 150945*Rank + 624270
R² = 0.9312

Use below R to get polynomial eq

poly.mod <- lm(Income ~ poly(Rank, 2, raw=TRUE),data=Employee_Analysis)
summary(poly.mod)

m(formula = Income ~ poly(Rank, 2, raw = TRUE), data = Employee_Analysis)

Residuals:

Min 1Q Median 3Q Max

-127622 -22179 -132 27474 108581

Coefficients:

Estimate Std. Error t value Pr(>|t|)   

(Intercept) 624270.1 10936.5 57.08 <2e-16 ***

poly(Rank, 2, raw = TRUE)1 -150944.8 4910.0 -30.74 <2e-16 ***

poly(Rank, 2, raw = TRUE)2 10031.2 476.6 21.05 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 39260 on 181 degrees of freedom

Multiple R-squared: 0.9312, Adjusted R-squared: 0.9305

F-statistic: 1225 on 2 and 181 DF, p-value: < 2.2e-16

d. Predict the Income for a person planning to apply for a position at Rank 5 at one of these competitive companies in the near future

x=5

using polymial eq

Income = 10031*Rank^2 - 150945*Rank + 624270

Income = 10031*5^2 - 150945*5 + 624270

Income=120320

predicted income uisng polynomial equation is 120320

using linear regression we get

income=438835.04 -50353.65*Rank

income=438835.04 -50353.65*5

=187066.8

Income=187067

predicted income uisng linear equation is 187067

ENTIRE R CODE IS

dim(Employee_Analysis)
names(Employee_Analysis)
reg.mod <- lm(Employee_Analysis$Income~Employee_Analysis$Rank)
summary(reg.mod)
coefficients(reg.mod)
plot(fitted(reg.mod),residuals(reg.mod))
poly.mod <- lm(Income ~ poly(Rank, 2, raw=TRUE),data=Employee_Analysis)
summary(poly.mod)


Related Solutions

A company rewards its production department employees for meeting budgeted cost levels by giving out bonuses....
A company rewards its production department employees for meeting budgeted cost levels by giving out bonuses. If the department’s costs exceed the budget, employees do not get a bonus. What problems might arise with such a plan?  
Python 3 Script: A company has classified its employees as follows.
Python 3 Script: A company has classified its employees as follows.Managers Hourly workersCommission workers Pieceworkers- who receive a fixed weekly salary- who receive a fixed hourly wage for up to the first 40 hours they work and“time-and-a-half”, i.e. 1.5 times their hourly wage, for overtime hours worked), - who receive $250 plus 5.7% of their gross weekly sales)- who receive a fixed amount of money per item for each of the items theyProduce. Each pieceworker in this company works on...
Bikes-R-Us Company The company sponsors a defined benefit plan for its 200 employees. On January 1,...
Bikes-R-Us Company The company sponsors a defined benefit plan for its 200 employees. On January 1, 2020, the company’s actuary provided the following information: Accumulated other comprehensive loss (PSC) $240,000 Pension plan assets (fair value and market-related asset value) 450,000 Accumulated benefit obligation $480,000 Projected benefit obligation $520,000 The average remaining service period for the participating employees is 6 years. All employees are expected to receive benefits under the plan. On December 31, 2020, the actuary calculated that the present...
How large is FedEx as measured by the number of its employees? How many levels in...
How large is FedEx as measured by the number of its employees? How many levels in the hierarchy does it have from the top to the bottom? based on these two measures and any other information you may have; would you say FedEx operates with a relatively tall or flat structure? Does FedEx have a centralized or decentralized approach to decision making?
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features in the data using summary 3.2 Randomize the iris data set, mix it up and normalize it 3.3 split data into training & testing (70/30 split) 3.4 Train model in data and use crosstable function to evaluate the results 3.5 Rerun your code for K=10 and 100. Compare results and explain
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data frame from the package MASS for the next few exercises. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise. Assuming independence...
*Please answer the following question using R code* 3. A bank wants to get new customers...
*Please answer the following question using R code* 3. A bank wants to get new customers for their credit card. They try two different approaches in their marketing campaign. The first promises a "cash back" reward, and the second promises low interest rates. A sample of 500 people is mailed the first brochure; of these, 125 get the credit card. A separate sample of 500 people is mailed the second brochure; 150 get the credit card. Are the two campaigns...
Question (5) [12 Marks] Note: Do not use R, do the calculations by hand. A very...
Question (5) [12 Marks] Note: Do not use R, do the calculations by hand. A very large (essentially infinite) number of butterflies is released in a large field. Assume the butterflies are scattered randomly, individually, and independently at a constant rate with an average of 6 butterflies on a tree. (a) [3 points] Find the probability a tree (X) has > 3 butterflies on it.   (b) [3 points] When 10 trees are picked at random, what is the probability 8...
USING EXCEL IF NEEDED FOR CALCULATIONS You are working in a Paper Company and the company...
USING EXCEL IF NEEDED FOR CALCULATIONS You are working in a Paper Company and the company needs to loan money from a bank to cover for the daily expenses. By analyzing the previous data, you come to the conclusion that the daily expenses are pretty stable at $17,000 per day, and the company works for 305 days a year. A bank has agreed to give you the loan, at an annual interest rate of 9% (i.e. for every dollar that...
3. Below are two independent sets of transactions for Welcott Company: (a) Welcott provides its employees...
3. Below are two independent sets of transactions for Welcott Company: (a) Welcott provides its employees with varying amounts of vacation per year, depending on the length of employment. The estimated amount of the current year's vacation pay is $78,000. Journalize the adjusting entry required on January 31, the end of the first month of the year, to record the accrued vacation pay. (b) Welcott maintains a defined contribution pension plan for its employees. The plan requires quarterly installments to be paid...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT