Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using F...

Abdallah S. A. Yaseen, Ahmed M. Gad, Abeer S. Ahmed

American Journal of Applied Mathematics and Statistics

Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation

Abdallah S. A. Yaseen1, Ahmed M. Gad2,, Abeer S. Ahmed1

1The National Centre for Social and Criminological Research, Cairo, Egypt

2Statistics Department, Faculty of Economics and Political Science, Cairo University, Egypt

Abstract

In longitudinal studies data are collected for the same set of units for two or more occasions. This is in contrast to cross-sectional studies where a single outcome is measured for each individual. Some intended measurements might not be available for some units resulting in a missing data setting. When the probability of missing depends on the missing values, missing mechanism is termed nonrandom. One common type of the missing patterns is the dropout where the missing values never followed by an observed value. In nonrandom dropout, missing data mechanism must be included in the analysis to get unbiased estimates. The parametric fractional imputation method is proposed to handle the missingness problem in longitudinal studies and to get unbiased estimates in the presence of nonrandom dropout mechanism. Also, in this setting the jackknife replication method is used to find the standard errors for the fractionally imputed estimates. Finally, the proposed method is applied to a real data (mastitis data) in addition to a simulation study.

Cite this article:

  • Abdallah S. A. Yaseen, Ahmed M. Gad, Abeer S. Ahmed. Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation. American Journal of Applied Mathematics and Statistics. Vol. 4, No. 3, 2016, pp 59-66. https://pubs.sciepub.com/ajams/4/3/1
  • Yaseen, Abdallah S. A., Ahmed M. Gad, and Abeer S. Ahmed. "Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation." American Journal of Applied Mathematics and Statistics 4.3 (2016): 59-66.
  • Yaseen, A. S. A. , Gad, A. M. , & Ahmed, A. S. (2016). Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation. American Journal of Applied Mathematics and Statistics, 4(3), 59-66.
  • Yaseen, Abdallah S. A., Ahmed M. Gad, and Abeer S. Ahmed. "Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation." American Journal of Applied Mathematics and Statistics 4, no. 3 (2016): 59-66.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

At a glance: Figures

1. Introduction

The defining characteristic of longitudinal studies is that sample units are measured repeatedly over time. That is, data are collected for the same set of units for two or more occasions. Missing values are not uncommon with longitudinal data.

Missing data mechanisms can be classified according to the process causing missingness, as defined by Little and Rubin [17]. These include; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) mechanism. Missing not at random mechanism is always termed nonignorable missing data mechanism. In this case the missing data mechanism must be included in the analysis, so as to get unbiased estimates.

Another important classification is the missingness pattern: the dropout and intermittent pattern. In dropout pattern a subject who leaves the study at some time point does not appear again; a missing value never followed by an observed value, whereas in intermittent pattern a missing value may be followed by an observed value.

Handling missing data requires jointly modeling the longitudinal outcome and the missing data process. There are many approaches for parametric modeling of the longitudinal outcome and the missing data process. The first is the selection models [6]. The selection models are better choice if the interest is on the inference about the marginal distribution of the response. This why we choose such models in this article. The second is the pattern mixture models [19]. The third is the shared parameter models [8]. For more details, refer to Molenberghs and Fitzmaurice [22].

The stochastic EM algorithm (SEM), suggested by Celeux and Diebolt [2], has been developed to facilitate the E-step of the EM algorithm. The stochastic EM algorithm has been extended to the longitudinal studies by Gad and Ahmed [9]. Other alternatives include the stochastic approximation EM (SAEM) algorithm [5] and the Monte Carlo EM (MCEM) algorithm [25]. Booth and Hobert [1] used an automated Monte Carlo EM algorithm to compute the E-step of the EM algorithm. A disadvantage of the MCEM algorithm is that the generated values are updated at each iteration which requires heavy computations and as a result this affects the speed of the convergence. In addition, the convergence is not guaranteed for a fixed Monte Carlo sample size [26].

Thus, the MCEM is developed using the parametric fractional imputation to facilitate the expectation step. Also, this can speed the convergence and to guarantee the existence of convergence [14, 15, 16, 27].

Kim and Kim [16] applied the parametric fractional imputation in the context of cross-sectional studies to deal with the missingness problem in the case of nonignroable missing mechanism. Yang et al. [27] generalized the approach to deal with the nonignorable missing mechanism in longitudinal studies using the shared parameter model.

The aim of this article is to develop the parametric fractional imputation to handle the nonignorable dropout in the context of longitudinal studies using the selection model of Diggle and Kenward [6]. In addition, the Jackknife replication method is used to obtain the standard errors of the fractionally imputed estimates. The performance of the proposed method is evaluated using a simulation study. Also, the proposed methods are applied to a real data (mastitis data). The rest of the article is organized as follows. In Section 2, the basic notations are introduced. In Section 3 the selection model for longitudinal data is introduced. The developed parametric fractional imputation method is described in Section 4. Section 5 is devoted to the proposed Jackknife method to evaluate the standard errors of the estimates. A simulation study is presented in Section 6 to evaluate the performance of the proposed methods. In Section 7 the proposed techniques are applied to the mastitis data Finally, Section 8 is devoted to conclusion.

2. Notations

Let be the sequence of the response outcomes and be the -vector of fully observed covariate for the measurement from the subject, made at time, ,. Let denote the time at which the measurements are taken. It is supposed that are common for all subjects. The set of responses for the subject are gathered into a -vector . The variable is assumed to be normally distributed with mean and variance-covariance matrix , i.e.

where , is matrix representing the covariates, is vector of unknown parameters, is the covariance matrix of dimension such that its -element, , represents the covariance between and .

The response variable can be modeled using the general linear model

(1)

where is assumed to follow multivariate normal distribution. That is,

The responses for all subjects are collected in an -vector, . The covariance matrix of is . Because it is assumed that the measurements from each subject are correlated, but uncorrelated with other measurements from other subjects, the matrix is a block diagonal matrix with non-zero blocks . The matrix may be unstructured containing parameters or it may have a parametric structure then it is function of a vector of unknown parameters, see Diggle et al. [7].

In the missing data context the response which represents the intended observations can be classified into two sub-vectors , where denotes the observed measurements of the ith subject and denotes the missing observations. A binary variable is assumed to represent the missing data process parameterized by . The equals 1 if is observed and equals 0 if is missing. The is assumed to follow Bernoulli distribution with a probability . The ’s of the subject can be arranged in a vector . The complete data of subject can be considered as . Let be an indicator of completeness that equals one if the individual has the complete measurements and zero otherwise. Let represents the log-likelihood function of a specific parameter.

3. Selection Model for Incomplete Longitudinal Data

Under the selection model (Diggle and Kenward, 1994), the joint distribution function of the response variable and the indicator can be written as:

where is a vector of parameters describing the response variable , is a fully observed matrix of covariates (design matrix), and is a vector of parameters describing the response indicator .

As defined by Little and Rubin [17], missingness is defined to be missing completely at random (MCAR) if is independent of and , i.e.,

The missing data mechanism is missing at random (MAR) if is independent of conditionally on , i.e.,

Otherwise, the missing data mechanism is missing not at random (MNAR).

Following Diggle and Kenward [6], the probability of dropout is modeled as a logistic model depending on the measurement at the time of dropout ; , the previous measurements; and the unknown parameter ; that is,

and the logistic model for the dropout process can be expressed as

4. Maximum Likelihood Estimation for Longitudinal Data with Missing Values Using Parametric Fractional Imputation

The log-likelihood function of and , , can be any function proportional to

If there are missing values, the observed density function can be written as

and the observed log-likelihood function of and , , will be any function proportional to , i.e.

Instead of maximizing to get the maximum likelihood estimator of and , Louis [20] tried to obtain the MLE by maximizing

(2)

This is due to the fact that maximizing the maximizing the observed log likelihood function requires getting an explicit form for the distribution of the observed data which is hard to calculate as we have to integrate over the distribution of the observed data. So a suitable solution is to use the distribution of the complete data and maximize the expectation of the complete data given the observed data as a proxy to avoid the calculations of the observed log-likelihood function

The EM algorithm can be applied in the iteration by calculating in the E-step. In the M-step, and are chosen to maximize the Q-function, i.e.

However, calculating the conditional expectation in (2) is cumbersome and time consuming. Thus, numerical approximation is needed. The MCEM approximates the Q-function in the E-step but the generated values are changed in each iteration and the convergence is not guaranteed.

The Parametric fractional imputation (PFI) develops the MCEM using the idea of the fractional weights where the generated values do not change in each iteration. Only the fractional weights are updated iteratively which guarantees the convergence and accelerates its rate. Kim and Kim [16] applied the parametric fractional imputation in cross-sectional studies. We will try to develop the parametric fractional imputation to longitudinal studies context with nonrandom dropout using the selection model of Diggle and Kenward [6] and the general linear model in (1).

The Parametric fractional imputation (PFI) algorithm can be conducted in the following steps.

(1) Generate imputed values for the missing data . Kim and Kim [16] and Yang et al [27] recommended generating the imputed values from an initial density with the same support as the density of the response variable. We recommend generating the imputed values from the conditional distribution of the missing data given the observed data, the response indicator and initial parameter estimates, , which has the same support as the density of the outcome variable and take into account the dropout process. Unfortunately, the distribution doesn’t have a standard form and it is not possible to simulate from it. Hence, an accept-reject procedure can be used to overcome this problem and to mimics the dropout process. The imputed values are generated from instead of and then, using an accept-reject procedure, the value can be accepted or rejected. Assuming normality, the conditional distribution, , is also normal distribution with mean and covariance matrix , where

and

where , , , and are suitable partitions of the mean vector and the covariance matrix .

(2) Given the imputed values, , ..., for the vector of missing for individual , , and the current parameter estimates , the joint density of the imputed values in the replicate gathered in the vector , for , will be

where is the imputed value for the individual in the time point. The replicated data for the individual are denoted by . Given and the current estimates and , a fractional weight is assigned in the iteration for each and can be calculated by

where is the probability of missing for the value given the current estimate .

(3) Using the fractional weight, , and the imputed vector, , the Monte Carlo approximation of (2) is given by

and

It is worth noting that this step corresponds to the E-step in the EM algorithm.

(4) Update the parameter estimates in two sub steps; the normal step and the logistic step. In the normal step, the maximum likelihood estimate for is obtained using an appropriate optimization procedure, for example the Jennrich- Schulchter algorithm [13]. In the logistic step, the maximum likelihood estimates for the logistic model

are obtained using the iterative reweighted least squares [3, 21].

(5) Repeat the previous three sub-steps until convergence.

It is of great interest to mention that the imputed values are not necessarily regenerated at each iteration. Only the fractional weights are updated in each iteration. Thus, the rate of convergence is fast and the convergence is guaranteed. For sufficiently large , the final estimates are asymptotically equivalent to the ML estimates [16].

There are many techniques depends on the idea of weights such as importance resampling [24]. However, the way of calculating the weights is different in the proposed method. Also, the aim of sampling importance resampling is to sampling from difficult distributions. This is not the case in the proposed method.

5. Standard Error Estimation

The standard errors of the estimated parameters, for fractionally imputed estimator, can be obtained using a replication method such as Jackknife or bootstrap. Kim and Kim [16] used Jackknife method to estimate the standard errors of the estimates. The Jackknife method can be conducted as follows:

(1) Generate independent samples, of size from the original sample by deleting one individual observation in each sample systematically, i.e. will contain while will contain …etc.

(2) In each of the generated samples, calculate the fractionally imputed estimators , for .

(3) Estimate the standard error of using the formula

where

The Jackknife method is used to obtain the estimated standard errors for the estimated parameters. The Jackknife method will be applied as in Kim and Kim [16]. The only difference is that, in generating the sample the vector of the observed data of the ith individual, , is omitted instead of deleting the observation.

6. The Simulation Study

The aim of this simulation study is to judge the performance of the proposed method. A number of replications B = 5000 Monte Carlo samples were generated at different sample sizes. Sample sizes are chosen as 30, 50, and 100 with five time points for each individual. This choice covers small, moderate and large sample size.

The response variable was simulated from where The matrix is a design matrix and the vector is of length 3; The logistic regression model used for the mastitis data is adopted here.

The covariance matrix is left unstructured. This means that there are 15 covariance parameters. Different covariance structures are also tried, such as compound symmetric, exponential structure. For definition of these structures see Diggle et al [7]. Data were generated to meet the assumptions of the multivariate normal, the assumed covariance model and the missingness model. The missing data is generated by a similar technique to that used by Kim and Kim [16], that the binary random variable is generated from the Bernoulli distribution with parameter equal to the probability of missingness for the specified value. The value is omitted if its associated binary variable is one. The logistics model is used to describe the dropout process is

Under this setup, the vector of parameters is , where , and and

These parameters are fixed at the values , and for low missingness rate and for high missingness rate respectively. The simulation is conducted under two missingness rates (percentage of individuals with missing data over the 5000 replications). The low missingness rate with missingness percentage ranges from 13% to 17%; and high missingness rate with missingness percentage ranges from 40% to 50%.

The parameter estimates have been obtained using the following methods;

(1) The multiple imputation with where is the number of the imputed values (MI).

(2) The fractional regression nearest neighbor imputation with (FRNNI).

(3) The parametric fractional imputation with =10 (PFI).

The choice of ten replicates for the PFI is to test if the proposed method can compete with other techniques at modest number of replicates, also to simplify the calculations. As proved by Yang et al [27], the more replicates are used, the better estimates are obtained. Therefore, it is expected to obtain more precise estimates by increasing the number of replicates. In the multiple imputations, we use predictive mean matching method described in Grannell and Murphy [11]. For the fractional regression nearest neighbor imputation, we apply the method described in Paik and Larsen [23].

The results are shown in Table 1 - Table 6. The multiple imputation (MI) estimates of the mean parameters have small relative bias but large relative bias for the covariance parameters in the case of low missingness rate. For high missingness rate, the estimates are seriously biased comparable to the other methods.

Table 1. The relative bias percentage for the simulation study at n=30, high missingness rate; PFI=parametric fractional imputation,, FRNNI= fractional nearest neighbor imputation, MI=multiple imputation

Table 2. The relative bias percentage for the simulation study at n=50, high missingness rate ; PFI=parametric fractional imputation, FRNNI= fractional nearest neighbor imputation, MI=multiple imputation

The fractional regression nearest neighbor (FRNNI) imputation leads to reasonable estimates for low missingness rate. It leads to relatively biased estimates, especially for the covariance model, in the case of high missingness rate. This can be noticed clearly for small and moderate sample sizes.

The parametric fractional imputation (PFI) estimates are relatively unbiased for most parameters regardless of the missingness rate and the sample size. The covariance estimates have small bias that has a negative relation with the response rate and the sample size. In general, the bias ranges from small to moderate and decreases with larger sample sizes. Using the proper (right) weights produce estimates with small bias. In general, the PFI estimates have lower bias rates comparable to the other two methods. In fact the parametric fractional imputation (PFI) method approximates the maximum likelihood estimates in the case of very large replications.

Table 3. The relative bias percentage for the simulation study at n=100, high missingness rate; PFI=parametric fractional imputation, FRNNI= fractional nearest neighbor imputation, MI=multiple imputation

Table 4. The relative bias percentage for the simulation study at n=30, low missingness rate; PFI=parametric fractional imputation, FRNNI= fractional regression nearest neighbor imputation, MI=multiple imputation

Table 5. The relative bias percentage for the simulation study at n=50, low missingness rate ; PFI=parametric fractional imputation, FRNNI= fractional regression nearest neighbor imputation, MI=multiple hot deck imputation

Hence, within the simulation context and depending on its results, we can conclude that the parametric fractional imputation method (PFI) provides reasonable estimates for the parameters in the case of nonrandom dropout even with small sample size.

Table 6. The relative bias percentage for the simulation study at n=100, low missingness rate; PFI=parametric fractional imputation, FRNNI= fractional regression nearest neighbor imputation, MI=multiple imputation

7. Application (Mastitis Data)

Figure 1. The first year yield vs. the second year yield of mastitis data

Mastitis is an infection of the udder causes reduction in the milk yield of the infected animals. The data set contains the total milk yield, in thousands litres, for 107 cows from a single herd for two successive years. 27 cows became infected in the second year and as a result their observations, although recorded, are considered missing in the second year. It is intended to compare the average of the milk yield in the two years. The data set has been analyzed by Diggle and Kenward [6] resulting that the type of missing is NMAR. They suggest using Nelder-Mead simplex algorithm to find parameters estimates. Also, the data has been analyzed by Gad and Kenward [10] where they used the stochastic EM algorithm to obtain parameters estimates. Figure 1 shows a scatter plot of the first year yield against the second year milk.

It is clearly from the graph that there is a strong positive correlation between the two yields and there are two values with high milk yield in the second year and low milk yield in the first year.

Figure 2 shows the profile lines of the completers; cows with data available for the two years. The graph shows a positive increase of the measurements from the first year to the second year.

Figure 2. Milk yields for the completers (80 cows)

The data is analyzed using the linear model

where is a vector containing the two observations for the animal, and

where and is the average of the response in the first and second year respectively. The compound symmetric covariance structure is chosen for the covariance matrix. In the compound symmetric structure, the covariance matrix, V, takes the form

where is the identity matrix of order and is matrix all of whose elements are ones. Hence, there are two covariance parameters and .

The unstructured covariance structure is also used for the covariance matrix. In this case three covariance parameters need to be estimated; . The estimates of the parameters are almost identical with the compound symmetric structure. So, the results of the unstructured covariance structure are only shown.

Without loss of generality, we assume that the dropout process depends on the measurement at the dropout time, the previous measurement and the unknown parameters . The dropout process is modeled as

The PFI is applied to the data with =10 and the standard errors are calculated using Jackknife replication method. The results are shown in Table 7. MI and FRNNI are not applicable in this kind of data where all the subjects share the same independent variable, thus we apply the MI in this data set using the propensity score method described in Grannell and Murphy [12]. For the sake of comparison between the proposed method and the previous analyses we also include in Table 7 the results of Diggle and Kenward [6] and Gad and Kenward [10].

Table 7. the PFI estimates and their standard errors for mastitis data. Also, the MI estimates, the Diggle-Kenward estimates and Gad-Kenward estimates

The results in Table 7 show that the average of the second year yield is larger than the average of the first year yield. The two estimates are statistically significant. A closer look at the dropout parameters shows that the probability of missingness has a negative relation with the second observation. This is natural because the infection with mastitis reduces the milk yield and this also supports the assumption of MNAR. The estimates of is slightly bigger in absolute value than . Both of the estimates are statistically significant. The Z-values for testing both the null hypotheses are significant at 95% confidence interval. For the MI estimates, the estimates of the mean parameters are reasonable but it seems the estimate of may be underestimated. Our results are similar to the results of Diggle and Kenward [6] and Gad and Kenward [10] for the model estimates and the covariance estimates. There are slightly differences in the estimates of the missingness. This is may be due to using different approach and different dropout models.

8. Conclusion

The parametric fractional imputation is proposed as an innovative tool for parameters estimation in the presence of the missing values. If the parametric fractional imputation is used to construct the score function, the solution to the imputed score equation is approximately the maximum likelihood estimator. The PFI is superior to the MCEM or SEM in the sense that the imputed values are not regenerated at each iteration which guarantees the convergence and accelerate its rate. Variance estimation can be obtained using a replication method such as Jackknife or bootstrap. The simulation results show that the proposed techniques provide reasonable estimates. However, one limitation of this technique is that its accuracy, like other variants of the EM algorithm, depends heavily on the assumptions of the selection model which may not be totally correct. Kim and Yu (2009) proposed a semi parametric approach for applying the fractional imputation method when the assumptions of the assumed model are suspected. A further research is recommended in this topic especially for the repeated measures data but this is out of the scope of this article.

References

[1]  Booth, J. G. and Hobert, J. P. (1999) Maximizing generalized linear models with an automated Monte Carlo EM algorithm, Journal of the Royal Statistical Society B, 61, 625-85.
In article      
 
[2]  Celeux, G. and Diebolt, J. (1985) The SEM algorithm: A probabilistic Teacher algorithm derived from the EM Algorithm for the mixture problem, Computational Statistics Quarterly, 2, 73-82.
In article      
 
[3]  Collett, D. (1991) Modelling Binary Data, Chapman and Hall, London.
In article      
 
[4]  Dempster, A. P, Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Society B, 39, 1-38.
In article      
 
[5]  Delyon, B, Laird, N. M. and Rubin, D. B. (1999) Convergence of a stochastic approximation version of the EM algorithm, The Annals of Statistics, 27(1), 94-128.
In article      
 
[6]  Diggle, P. J. and Kenward, M. G. (1994) Informative dropout in longitudinal data analysis, Journal of the Royal Statistical Society B, 43, 49-93.
In article      
 
[7]  Diggle, P.J. Liang, K. Y. and Zeger, S. L. (1994) Analysis of Longitudinal Data, Oxford: Oxford Science, UK.
In article      
 
[8]  Follmann, D. and Wu, M. (1995) An approximate generalized linear model with random effects for informative missing data, Biometrics, 51, 151-168.
In article      
 
[9]  Gad, A. M. and Ahmed, A. S. (2006) Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm, Computational Statistics & Data Analysis, 50, 2702-2714.
In article      
 
[10]  Gad, A. M. and Kenward, M. G. (2001) The Stochastic EM algorithm and sensitivity analysis for nonrandom dropout models, Proceedings of the 12th Conference for Statistics and Computer Modelling in Human and Social Sciences, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt.
In article      
 
[11]  Gad, A. M. and Youssif, N. A. (2006) Linear mixed models for longitudinal data with nonrandom dropouts, Journal of Data Science, 4(4), 447-460.
In article      
 
[12]  Grannell, A. and Murphey, H. (2011) Using multiple imputation to adjust for survey non-response, Proceedings of the sixth ASC international conference, University of Bristol, UK.
In article      
 
[13]  Jennrich, R. I. and Schluchter, M. D. (1986) Unbalanced repeated-measures models with structured covariance matrices, Biometrics, 42, 805-820.
In article      
 
[14]  Kim, J. K. (2011) Parametric fractional imputation for missing data analysis, Biometrika, 98, 119-132.
In article      
 
[15]  Kim, J. K. and Fuller, W. (2008) Parametric fractional imputation for missing data analysis. Proceeding of the section on survey research method, Joint Statistical Meeting, pp. 158-169.
In article      
 
[16]  Kim, J. Y. and Kim, J. K. (2012) Parametric fractional imputation for nonignorable missing data, Journal of the Korean Statistical Society, 41, 291-303.
In article      
 
[17]  Little, R. J. and Rubin, D. B. (1987) Statistical Analysis with Missing Data, John Wiley & Sons, New York.
In article      
 
[18]  Kim, J. K. and Yu, C. L. (2009) A semi-parametric approach to fractional imputation for nonignorable missing data, Survey Research Methods Proceedings: 2603-2610.
In article      
 
[19]  Little, R. J. (1993) Pattern mixture models for multivariate incomplete data, Journal of American Statistical Association, 88, 125-134.
In article      
 
[20]  Louis, T. A. (1982) Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society B, 44, 226-233.
In article      
 
[21]  McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, Chapman and Hall, London.
In article      
 
[22]  Molenberghs, G. and Fitzmaurice, G. (2009) Longitudinal Data, Fitzmaurrice, G. Davidian, M. Verbeke, M. and Molenberghs, G. Editors, Chapman & Hall/CRC Taylor & Francis Group, USA, ch.17.
In article      
 
[23]  Paik, M. and Larsen, M. D. (2006) Fractional regression nearest neighbor imputation, Proceedings of the joint statistical meeting, American Statistical Association, Alexandria, VA.
In article      
 
[24]  Rubin, D. B. (1987) A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest: the SIR algorithm. Discussion of Tanner and Wong, Journal of the American Statistical Association, 82, 543-546.
In article      
 
[25]  Wei, G. C. G. and Tanner, M. A. (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm, Journal of the American Statistical Association, 85, 699-704.
In article      
 
[26]  Yang, X. Li, J. and Shoptaw, S. (2008) Imputation –based strategies for clinical trial longitudinal data with nonignorable missing values, Statistics in Medicine, 27, 2826-2849.
In article      
 
[27]  Yang, S., Kim, J. K. and Zhu, Z. (2012) Parametric fractional imputation using adjusted profile likelihood for linear mixed models with nonignorable missing data, Proceeding of the section on survey research method, Joint Statistical Meeting, pp. 4366-4376.
In article      
 
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn