In: Statistics and Probability
Justify jackknife method is a special case of bootstrap method.
1. The principles of Cross-validation, Jackknife, and Bootstrap are very similar, but Bootstrap overshadows the others for it is a more thorough procedure in the sense that it draws many more sub-samples than the others. It has also been found that the Bootstrap technique provides less biased and more consistent results than the Jackknife method does. Nevertheless, Jackknife is still useful in Exploratory Data Analysis (EDA) for assessing how each sub-sample affects the model.
2. To further compare the two resampling methods, the Jackknife method requires computing the estimator ?-cap ‘n’ times, where each computation is based on (n-1) observations. Also a Jackknife should not be applied if the estimator ?n-cap (X) is too discontinuous as a function of the Xi , or if ?n?-cap (X) depends on a few values in X.
3. the Jackknife is asymptotically equivalent to the bootstrap is inventively shown by Efron and Tibshirani in their monograph (ibid.). They discuss the process of sampling from blocks of data as equivalent to defining a multinomial distribution on the sample n-tuple with equal probabilities for each sample point. In the case of n = 3, the list of possible points is described as a triangle where each point can be selected from the set of all possible resamples of the three elements, and is graphically represented as points along the three edges of the triangle. The graphic below is reproduced from the Efron and Tibshirani paper presented in 1986.
It shows the domain of the sample functional, where the ideal bootstrap sample represents the surface attained by the domain points on the simplex. Then the jackknife sample is an approximating hyperplane to the bootstrap surface.
Define P0 = (1/n, . . . , 1/n)T , and U = (U1, U2, . . . , Un) T such that the elements of U sum to 0. Then T(P0 ) is the original sample statistic, and T(P(i) ) is the jackknife replicate for sample point i:
P(i) = (1/(n ? 1), . . . , 0, . . . 1/(n ? 1))T .
A linear statistic T(P? ), where T is the functional of the vector of all probabilities such that each element is in [0, 1] and the sum of the elements is 1, has the following form: T(P? ) = c0+(P ? ? P 0 )T U The linear statistic defines a hyperplane over simplex Sn. The following result states that for any statistic the jackknife estimate of the variance of T(P? ) is almost the same as the bootstrap estimate for a certain linear approximation to T(P? ).