In: Statistics and Probability
Please use Statistical Software R
Consider a dataset called fandango in fivethirtyeight package:
Identify the Top 5 best rated and Top 5 worst rated movies based on rottentomatoes.
Identify the Top 5 best rated and Top 5 worst rated movies based on the average of three users’ scores (rottentomatoes_user, metacritic_user, and imdb).
Visualize the difference between Fandango stars and actual Fandango ratings. Comment on what you see.
Construct a formal test to see if there is a significant difference between between Fandango stars and actual Fandango ratings.
We have written R code for this problem
the code is as below
#########################################################
install.packages("fivethirtyeight")
require(fivethirtyeight)
z=fandango
"rottentomatoes" %in% names(z)
n=length(z$rottentomatoes)
sort(z$film,partial=n-3)[n-3]
#worse 5
z$film[order(z$rottentomatoes)[1:5]]
#top 5
z$film[order(z$rottentomatoes,decreasing = T)[1:5]]
# finding average of 3 columns
new_average=rowMeans(z[,c("rottentomatoes_user" ,"metacritic_user"
,"imdb")])
# worse 5 based on new average
z$film[order(new_average)[1:5]]
# top 5 based on new average
z$film[order(new_average,decreasing = T)[1:5]]
names(z)
plot(z$fandango_stars,col="red",xlab = "flim",main = "Comparison
of Fandango stars and actual Fandango ratings")
points(z$fandango_ratingvalue,col="blue")
# test for difference test for signicance paired t-test
t.test(z$fandango_stars,z$fandango_ratingvalue,paired = T)
here p-value=0.000<0.05 we can say that there is a significant difference in these two variables.