In: Statistics and Probability
Integrated Nested Laplace approximation:
NLA is a deterministic paradigm for Bayesian inference in latent Gaussian models (LGMs) introduced in Rue et al. (2009).
INLA relies on a combination of analytical approximations and efficient numerical integration schemes to achieve highly accurate deterministic approximations to posterior quantities of interest.
The main benefit of using INLA instead of Markov chain Monte Carlo (MCMC) techniques for LGMs is computational; INLA is fast even for large, complex models. Moreover, being a deterministic algorithm, INLA does not suffer from slow convergence and poor mixing.
Small Area Estimation (SAE):
In small area estimation (SAE) one investigates how to obtain these area specific characteristics from survey data covering more than only the area of interest by using spatial smoothing methods.
Obtaining reliable estimates about health outcomes for areas or domains where only few to no samples are available is the goal of small area estimation (SAE).
Here describe a spatial predictive model-based approach to SAE for a binary health outcome in a complex survey with given sampling weights.
Lets assume that the sampling weights on the sampled individuals are the only information available about the survey design.
The goal is to estimate the prevalence of the health outcome for all small areas in the spatial domain.
A hierarchical Bayesian model is used in which the health outcomes are regressed on the sampling weights.
A nonparametric regression on the weights is used to minimise possible bias of the regression function.
Additionally, both unstructured and structured spatial random effects are introduced to model the geographical distribution of the health outcomes.
The population distribution of the sampling weights is unknown as well, hence we must model the weights themselves to be able to perform predictions.
Notation:
Let Yik be a binary health outcome for individual i in small area k (i = 1, …, Nk and k = 1, …, K) with Nk the population size in area k.
Lets assume that Nk is known for each area. A sample of size nk is drawn from each area k, where some of the nk could be zero. Denote the sampled values by yik.
Let and represent the total population and sample size, respectively. We shall focus on estimating the true prevalence, Pk, in each area k, namely
...........................................................(1)
Let Rik denote the binary variable indicating whether the ith individual in area k is sampled (Rik = 1) or not (Rik = 0). We use sk to indicate the set of sampled individuals in area k and s′k for those that are not sampled.
To reflect the sampling design, weights wik are attached to each respondent’s outcome.The weights are proportional to the inverse probability of inclusion in the sample for unit i in area k. These weights can reflect both or a combination of the complex survey design and post-stratification adjustments.
Lets further assume that all sampled individuals respond to the survey. A typical dataset will have the structure as presented in Table 1. Throughout this article, we use the normalized weights, denoted by , defined by
......................................................(2)
The weights are called normalized because they sum up to the sample size nk in area k.
Table 1
Structure of datasets used in this article.
Response | Area | Sample weight |
---|---|---|
y11 | 1 | w11 |
y21 | 1 | w21 |
⋮ | ⋮ | ⋮ |
y12 | 2 | w12 |
⋮ | ⋮ | ⋮ |
Proposed Methods:
The Bayesian hierarchical model for the outcomes and the multinomial model described in Below are fitted using the integrated nested Laplace approximations (INLA) approach by Rue et al. (2009).
Hierarchical model:
A predictive model-based approach proposed by Royall (1970) is used to specify an estimator for Pk. The estimator is given by
.................................................(3)
Now extend these ideas to small area estimation.
The normalized sampling weights are used as a covariate in the model for the observed outcomes yik. We employ Bayesian hierarchical models consisting of three stages.
Implementation:
INLA yields a computationally convenient alternative to Markov chain Monte Carlo (MCMC) techniques. This method combines Laplace approximations and numerical integration in a very efficient manner to carry out a Bayesian analysis.
Sampling using this constraint is achieved by considering the intrinsic Gaussian Markov random field representation of the ICAR model for which, in addition, a linear constraint is assumed.
Sampling from the posterior distributions obtained from INLA is done via the inla.posterior.sample() function.