In: Statistics and Probability
Find a dataset online, and get a feel for it by performing some EDA. Produce a single plot which you think captures an interesting aspect of the data, and comment on it. (If you wish to use R, there are many data sets already built in, e.g. to the package ‘MASS’; if you are not using R, datasets are readily available -a simple Google search of ’sample datasets’ yields numerous results, for example.)
Solution:
Rcode:
library(MASS)
data()
print(HumanBodyTemp )
sum(HumanBodyTemp$temp)
mean(HumanBodyTemp$temp)
sd(HumanBodyTemp$temp)
length(HumanBodyTemp$temp)
median(HumanBodyTemp$temp)
outlier_values <- boxplot.stats(HumanBodyTemp$temp)$out #
outlier values.
boxplot(HumanBodyTemp$temp, main="Age ", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")),
cex=0.6)
boxplot(HumanBodyTemp$temp,main="boxplot")
hist(HumanBodyTemp$temp,main="histogram")
fivenum(HumanBodyTemp$temp)
Intrepretation:
It ahs one numerical variable called temp
mean=98.524
median= 98.6
standard deviation=0.6777905
No outliers seen from boxplot,Fivenumber summary from boxplot is
Minimum value= 97.4
Q1=98.0
Q2=median= 98.6
Q3=99.0
Maximum=100.0
From Histogram and shapiro.test
p=0.7001
p>0.05
variable temp follows normal distribution.
R scrreenshot