In: Statistics and Probability
Forty architecture students were each asked to judge 5
different building structures. The response variable of
interest
is the judge's overall satisfaction (SAT), where a higher
score
is better. We wish to compare the mean satisfaction rating
across
the five buildings, so the factor of interest is BLDG.
USE R OR SAS TO SOLVE THE PROBLEMS. PLEASE INCLUDE YOUR CODE TO GET THE ANSWER.
I am not sure how the data is not understandable. Literally take the data plug it into r or sas to get the answers. SUBJ - this is one of the fourty testers. BLDG - this is what building model they rated. SAT - is there rating of said building.
(a) Why does it make sense to use the judge (denoted SUBJ in
the
data set) as a blocking variable? Why should we treat this
block
as a random effect?
(b) Analyze the data as a RBD, where SAT is the response, BLDG
is
the treatment factor, and SUBJ is the block. Based on the
appropriate
F-test, is there a significant difference in mean satisfaction
rating
across the five buildings? NOTE: Use a 0.10 significance level.
(c) Based on the appropriate F-test, is there significant
variation
among the judges? NOTE: Use a 0.10 significance level.
(d) Of particular interest to the investigators is whether the
mean
satisfaction for building 1 differs significantly from the mean
satisfaction
for the other four buildings. Use an ESTIMATE statement to test
the
appropriate contrast here. NOTE: Use a 0.10 significance level.
data buildings; input SUBJ BLDG SAT; cards; 1 1 2 1 2 5 1 3 6 1 4 5 1 5 7 2 1 5 2 2 6 2 3 6 2 4 7 2 5 4 3 1 4 3 2 7 3 3 3 3 4 6 3 5 7 4 1 6 4 2 4 4 3 7 4 4 5 4 5 7 5 1 2 5 2 6 5 3 4 5 4 7 5 5 5 6 1 4 6 2 6 6 3 7 6 4 5 6 5 3 7 1 7 7 2 5 7 3 5 7 4 7 7 5 4 8 1 3 8 2 7 8 3 6 8 4 7 8 5 6 9 1 6 9 2 7 9 3 8 9 4 6 9 5 3 10 1 5 10 2 3 10 3 3 10 4 5 10 5 6 11 1 3 11 2 6 11 3 4 11 4 4 11 5 3 12 1 3 12 2 6 12 3 7 12 4 5 12 5 3 13 1 4 13 2 1 13 3 7 13 4 1 13 5 6 14 1 4 14 2 6 14 3 8 14 4 5 14 5 1 15 1 4 15 2 4 15 3 4 15 4 5 15 5 5 16 1 8 16 2 5 16 3 9 16 4 9 16 5 5 17 1 5 17 2 5 17 3 6 17 4 7 17 5 5 18 1 5 18 2 4 18 3 6 18 4 6 18 5 6 19 1 2 19 2 5 19 3 6 19 4 2 19 5 8 20 1 2 20 2 8 20 3 7 20 4 8 20 5 2 21 1 8 21 2 8 21 3 8 21 4 8 21 5 3 22 1 5 22 2 4 22 3 4 22 4 3 22 5 5 23 1 6 23 2 6 23 3 6 23 4 6 23 5 4 24 1 3 24 2 5 24 3 8 24 4 5 24 5 6 25 1 6 25 2 2 25 3 5 25 4 7 25 5 6 26 1 2 26 2 7 26 3 4 26 4 7 26 5 2 27 1 7 27 2 7 27 3 7 27 4 7 27 5 7 28 1 8 28 2 5 28 3 5 28 4 6 28 5 3 29 1 2 29 2 6 29 3 7 29 4 4 29 5 5 30 1 1 30 2 5 30 3 5 30 4 6 30 5 6 31 1 9 31 2 7 31 3 8 31 4 2 31 5 8 32 1 6 32 2 9 32 3 1 32 4 8 32 5 4 33 1 2 33 2 6 33 3 8 33 4 9 33 5 8 34 1 8 34 2 4 34 3 3 34 4 3 34 5 9 35 1 2 35 2 7 35 3 2 35 4 9 35 5 2 36 1 2 36 2 9 36 3 1 36 4 8 36 5 3 37 1 7 37 2 2 37 3 3 37 4 3 37 5 6 38 1 3 38 2 7 38 3 3 38 4 2 38 5 2 39 1 3 39 2 3 39 3 5 39 4 3 39 5 3 40 1 9 40 2 5 40 3 8 40 4 7 40 5 8 ; run;
I have included all the answers in the R script below:
# After the data is loaded into R head(buildings) # Extracting subject and building as factors SAT <- buildings[, "SAT"] SUBJ <- factor(buildings[, "SUBJ"]) BLDG <- factor(buildings[, "BLDG"]) # (a) Why does it make sense to use the judge (denoted SUBJ in the # data set) as a blocking variable? Why should we treat this block # as a random effect? # Let us ignore the blocks and just do a one-way CRD ANOVA av.CRD <- aov(SAT~BLDG) summary(av.CRD)
Df Sum Sq Mean Sq F value Pr(>F) BLDG 4 33.6 8.392 1.956 0.103 Residuals 195 836.7 4.291 # The p-value of the obtained F-statistic for BLDG # for the buildings is 0.103. From here, we can be tempted # conclude that there is no significant difference in the SAT # ratings of the 5 buildings. BUT THIS IS WRONG. # Why is this wrong? # Because we forgot to account for the variation in the ratings # of these 40 Judges. The inherent biases in the ratings they give # can influence the ratings of the Buildings. These are said to have # random effects. We can account for these variations by treating SUBJ as # a blocking variable. # Blocking is used to remove the effects of a few of the most important # nuisance variables. Randomization is then used to reduce the contaminating # effects of the remaining nuisance variables. For important nuisance variables, # blocking will yield higher significance in the variables of interest # than randomizing
# b) Analyze the data as a RBD, where SAT is the response, BLDG is # the treatment factor, and SUBJ is the block. Based on the appropriate # F-test, is there a significant difference in mean satisfaction rating # across the five buildings? # Now perform the Analysis of variance # SAT = response # BLDG = treatment factor # SUBJ is the blocking factor av = aov(SAT~BLDG+SUBJ) summary(av)
Df Sum Sq Mean Sq F value Pr(>F) BLDG 4 33.6 8.393 2.032 0.0926 . SUBJ 39 192.3 4.931 1.194 0.2236 Residuals 156 644.4 4.131 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Ans (b) # The p-value of the obtained F-statistic for BLDG # for the buildings is 0.0926. Since it is less # than the significance level alpha=0.1, we can # conclude that there is significant difference # in mean satisfaction rating across the five buildings # Ans (c) # The p-value of the obtained F-statistic for SUBJ # for the buildings is 0.2236. Since it is more # than the significance level alpha=0.1, we can # conclude that there is NO significant variation # among the judges
# (d) Testing whether the mean satisfaction for building 1 # differs significantly from the mean satisfaction # for the other four buildings # Choosing the appropriate contrasts (they should sum up to 0) contrasts(BLDG) <- matrix(c(1, -1/4, -1/4, -1/4, -1/4), nrow=5, ncol=1) # This will give us the the two-sided P-value of t-test # about contrast. # Here we fit the Multiple Linear Regression model # and explore the summary output summary(lm(SAT ~ BLDG + SUBJ))$coef["BLDG1", ]
Estimate Std. Error t value Pr(>|t|) -0.64500000 0.28743561 -2.24398082 0.02624094 # Ans (d) # Since the p-value = 0.0262 is very less than the # the significance level alpha = 0.1, we can conclude # YES, the mean satisfaction for building 1 # differs significantly from the mean satisfaction # for the other four buildings
Please upvote and provide feedback if this answer helped
you. This would help me improve and better my solutions.
I will be happy to answer your doubts, if any in the comment
section below. Thanks! :)