In: Statistics and Probability
Find a data set on the internet. Some suggested search terms: Free Data Sets, Medical Data Sets, Education Data Sets.
We have obtained the data set on monthly road accidents in Uttar
Pradesh for the past 14 years (2001-2014) and therefore there are
168 data points.
Below is the link of our data:
https://www.kaggle.com/pratimtalukdar/road-accidents-in-indian-states-2001-2014
Information of our Data:
Head of the data : ##Monthly Road Accidents in Uttar
Pradesh(2001-2014)
January 2001 1695
February 2001 1737
March 2001 1652
April 2001 1663
May 2001 1733
June 2001 1027
Tail of the Data ##Monthly Road Accidents in Uttar
Pradesh(2001-2014)
January 2014 2086
February 2014 2017
March 2014 1824
April 2014 2023
May 2014 2170
June 2014 2325
Summary of the Data ##Monthly Road Accidents in Uttar
Pradesh(2001-2014)
Min 794
1st Quantile 1248
Median 1628
Mean 1634
3rd Quantile 2031
Max 2554
Checking Presence of Trend and Seasonality in the data Testing
of Presence of Trend in the model:
We have to check if there is a trend in the model or not.
Relative Ordering test is conducted.
H0 : No Trend in the model.
against H1 : Trend is present in the model. R: Number of discordant
pairs ?(?)=?(?−1)/4
If R > E(R): indication of falling trend
R < E(R): indication of rising trend.
R is related with Kendell’s (τ), the rank correlation coefficient.
?=1−{4?/?(?−1)}
Under H0, E(τ) = 0 ?ar(?)=2(2?+5)/9?(?−1)
Test statistic: ?=(?−?(?))/???(?) ~ ?(0,1)
Test Criterion:
Reject H0 if observed |Z|>Zα/2 at α level of significance.
Value of R =2613, E(R)=7014. There is rising trend in our
model.
|Z|=12.07334 >1.96(Z0.025), and hence we reject our null
hypothesis.
We conclude that there is presence of trend in the model.
R-Codes
####----Time Series Analysis of Road Accidents in Uttar
Pradesh----####
getwd()
up <- read.csv(choose.files())
attach(up)
head(up)
tail(up)
summary(up)
up1 = ts(up, start = c(2001, 1), end = c(2014, 12), frequency =
12)
D = decompose(up1)
plot.ts(up1, ylab = "Monthly Road Accidents in Up (2001-2014)",
xlab = "Years")
plot(D)
#------Test for Randomness(Turning pt. test)--------#
#------Null hypothesis-Series is purely random------#
tp_tst = function(y)
{ q_ij = 0
for(i in 2:(length(y)-1))
{
if(((y[i] > y[i+1])&&(y[i] > y[i-1])) || ((y[i] <
y[i+1])&&(y[i] < y[i-1])))
q_ij = q_ij + 1
}
cat(q_ij)
exp_T = (2/3)*(length(y)-2)
V_T = (16*length(y)-29)/90
tst_stat = (q_ij-exp_T)/sqrt(V_T)
cat(tst_stat)
z_alpha=qnorm(0.025)
if(abs(tst_stat) > abs(z_alpha))
{
cat(" \n On the basis of the given data Null Hypothesis is
rejected")
}
else
{
cat(" (On the basis of the given data Null Hypothesis is
accepted)")
}
}
tp_tst(racci)
####-----checking presence of trend (Relative Ordering
Test)-----####
####-----------------Null Hypothesis-No
trend--------------------####