Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study
Read full paper at:
http://www.scirp.org/journal/PaperInformation.aspx?PaperID=52855#.VKs-f8nQrzE
http://www.scirp.org/journal/PaperInformation.aspx?PaperID=52855#.VKs-f8nQrzE
Author(s)
Missing data can frequently occur in a longitudinal
data analysis. In the literature, many methods have been proposed to
handle such an issue. Complete case (CC), mean substitution (MS), last
observation carried forward (LOCF), and multiple imputation (MI) are the
four most frequently used methods in practice. In a real-world data
analysis, the missing data can be MCAR, MAR, or MNAR depending on the
reasons that lead to data missing. In this paper, simulations under
various situations (including missing mechanisms, missing rates, and
slope sizes) were conducted to evaluate the performance of the four
methods considered using bias, RMSE, and 95% coverage probability as
evaluation criteria. The results showed that LOCF has the largest bias
and the poorest 95% coverage probability in most cases under both MAR
and MCAR missing mechanisms. Hence, LOCF should not be used in a
longitudinal data analysis. Under MCAR missing mechanism, CC and MI
method are performed equally well. Under MAR missing mechanism, MI has
the smallest bias, smallest RMSE, and best 95% coverage probability.
Therefore, CC or MI method is the appropriate method to be used under
MCAR while MI method is a more reliable and a better grounded
statistical method to be used under MAR.
KEYWORDS
Cite this paper
Zhu, X. (2014) Comparison of Four Methods for
Handing Missing Data in Longitudinal Data Analysis through a Simulation
Study. Open Journal of Statistics, 4, 933-944. doi: 10.4236/ojs.2014.411088.
| [1] | Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with Missing Data. John Wiley & Sons, New York. |
| [2] |
Collins, L.M., Schafer, J.L. and
Kam, C.M. (2001) A Comparison of Inclusive and Restrictive Missing-Data
Strategies in Modern Missing-Data Procedures. Psychological Methods, 6,
330-351. http://dx.doi.org/10.1037/1082-989X.6.4.330 |
| [3] |
Little, R.J.A. (1988) A Test of
Missing Completely at Random for Multivariate Data with Missing Values.
Journal of the American Statistical Association, 83, 1198-1202. http://dx.doi.org/10.1080/01621459.1988.10478722 |
| [4] | Diggle, P.J., Heagerty, P., Liang, K.Y. and Zeger, S.L. (2002) Analysis of Longitudinal Data. 2nd Edition, Clarendon Press, Clarendon. |
| [5] |
Carpenter, J.R., Kenward, M.G.
and Vansteelandt, S. (2006) A Comparison of Multiple Imputation and
Doubly Robust Estimation for Analyses with Missing Data. Journal of the
Royal Statistical Society, Series A (Statistics in Society), 169,
571-584. http://dx.doi.org/10.1111/j.1467-985X.2006.00407.x |
| [6] |
Musil, C.M., Warner, C.B.,
Yobas, P.K. and Jones, S.L. (2002) A Comparison of Imputation Techniques
for Handling Missing Data. Western Journal of Nursing Research, 24,
815-829. http://dx.doi.org/10.1177/019394502762477004 |
| [7] | Sprint, A. and Dupin-Sprint, T. (1993) Imperfect Data Analysis. Drug Information Journal, 27, 995-994. |
| [8] | Myers, W.R. (2000) Handling Missing Data in Clinical Trials: An Overview. Drug Information Journal, 34, 525-533. |
| [9] | Hening, D. and Koonce, D.A. (2014) Missing Data Imputation Method Comparison in Ohio University Student Retention Database. Proceeding of the 2014 International Conference on Industrial Engineering and Operations Management, Bali, Indonesia. |
| [10] | Ali, A.M.G., Dawson, S.J., Blows, F.M., Provenzano, E., Ellis, I.O., Baglietto, L., Huntsman, D., Caldas, C. and Pharoah, P.D. (2011) Comparison of Methods for Handling Missing Data on Immunohistochemical Markers in Survival Analysis of Breast Cancer. British Journal of Cancer, 104, 693-699. |
| [11] |
Patrician, P.A. (2002) Focus on
Research Methods Multiple Imputation for Missing Data. Research in
Nursing & Health, 25, 76-84. http://dx.doi.org/10.1002/nur.10015 |
| [12] | Nakai, M., Chen, D.G., Nishimura, K. and Miyamoto, Y. (2014) Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism. Open Journal of Statistics, 4, 27-37. |
| [13] |
Lavori, P.W., Dawson, R. and
Shera, D. (1995) A Multiple Imputation Strategy for Clinical Trials with
Truncation of Patient Data. Statistics in Medicine, 14, 1913-1925. http://dx.doi.org/10.1002/sim.4780141707 |
| [14] | Allison, P.D. (2001) Missing Data. Sage Publications, Thousand Oaks. |
| [15] |
Kim, J.O. and Curry, J. (1977)
The Treatment of Missing Data in Multivariate Analysis. Sociological
Methods Research, 6, 215-240. http://dx.doi.org/10.1177/004912417700600206 |
| [16] | Allison, P.D. (1998) Multiple Regression: A Primer. Pine Forge Press, Thousand Oaks. |
| [17] | Little, R.J.A. (1992) Regression with Missing X’s: A Review. Journal of the American Statistical Association, 87, 1227-1237. |
| [18] | Greenland, S. and Finkle, W.D. (1995) A Critical Look at Methods for Handling Missing Covariates in Epidemiologic Regression Analyses. American Journal of Epidemiology, 142, 1255-1264. |
| [19] |
Schafer, J.L. and Graham, J.W.
(2002) Missing Data: Our View of the State of the Art. Psychological
Methods, 7, 147-177. http://dx.doi.org/10.1037/1082-989X.7.2.147 |
| [20] |
Carpenter, J., Kenward, M.G.,
Evans, S. and White, I. (2004) Last Observation Carry-Forward and Last
Observation Analysis. Statistics in Medicine, 23, 3241-3242. http://dx.doi.org/10.1002/sim.1891 |
| [21] |
Cook, R.J., Zeng, L.L. and Yi,
G.Y. (2004) Marginal Analysis of Incomplete Longitudinal Binary Data: A
Cautionary Note on LOCF Imputation. Biometrics, 60, 820-828. http://dx.doi.org/10.1111/j.0006-341X.2004.00234.x |
| [22] |
Jansen, I., Beunckens, C.,
Molenberghs, G., Verbeke, G. and Mallinckrodt, C. (2006) Analyzing
Incomplete Discrete Longitudinal Clinical Trial Data. Statistical
Science, 21, 52-69. http://dx.doi.org/10.1214/088342305000000322 |
| [23] |
Rubin, D.B. (1987) Multiple
Imputation for Nonresponse in Surveys. John Wiley & Sons Inc., New
York.
http://dx.doi.org/10.1002/9780470316696 |
| [24] | Tabachnick, B.G. and Fidell, L.S. (2000) Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton. |
| [25] |
Molenberghs, G., Thijs, H.,
Jansen, I., et al. (2004) Analyzing Incomplete Longitudinal Clinical
Trial Data. Biostatistics, 5, 445-464. http://dx.doi.org/10.1093/biostatistics/kxh001 |
| [26] |
Shao, J. and Zhong, B. (2003)
Last Observation Carry-Forward and Last Observation Analysis. Statistics
in Medicine, 22, 2429-2441. http://dx.doi.org/10.1002/sim.1519 |
| [27] |
Mallinckrodt, C.H., Clark, W.S.
and David, S.R. (2001) Accounting for Dropout Bias Using Mixed-Effects
Models. Journal of Biopharmaceutical Statistics, 11, 9-21. http://dx.doi.org/10.1081/BIP-100104194 eww150105lx |
| [28] |
Mallinckrodt, C.H., Kaiser,
C.J., Watkin, J.G., Detke, M.J., Molenberghs, G. and Carroll, R.J.
(2004) Type I Error Rates from Likelihood-Based Repeated Measures
Analyses of Incomplete Longitudinal Data. Pharmaceutical Statistics, 3,
171-186. http://dx.doi.org/10.1002/pst.131 |
| [29] |
Gadbury, G.L., Coffey, C.S. and
Allison, D.B. (2003) Modern Statistical Methods for Handling Missing
Repeated Measurements in Obesity Trials: Beyond LOCF. Obesity Reviews,
4, 175-184.
http://dx.doi.org/10.1046/j.1467-789X.2003.00109.x |
| [30] |
Rubin, D.B. (1977) Formalizing
Subjective Notions about the Effect of Nonrespondents in Sample Surveys.
Journal of the American Statistical Association, 72, 538-543. http://dx.doi.org/10.1080/01621459.1977.10480610 |
| [31] |
Schafer, J.L. (1997) The
Analysis of Incomplete Multivariate Data. Chapman & Hall, London.
http://dx.doi.org/10.1201/9781439821862 |
| [32] | Schafer, J.L. (2000) Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton. |
| [33] | Rubin, D.B. (2004) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons Inc., New York. |
| [34] |
Bodner, T.E. (2008) What
Improves with Increased Missing Data Imputations? Structural Equation
Modeling: A Multidisciplinary Journal, 15, 651-675. http://dx.doi.org/10.1080/10705510802339072 |
| [35] | Dmitrienko, A., Molenberghs, G., Chuang-Stein, C. and Offen, W. (2005) Analysis of Clinical Trials Using SAS: A Practical Guide. SAS Institute Inc., Cary. |
| [36] | Yuan, Y.C. (2000) Multiple Imputation for Missing Data: Concepts and New Development. SAS Institute Inc., Rockville. |
| [37] |
Allison, P.D. (2000) Multiple
Imputation for Missing Data: A Cautionary Tale. Sociological Methods and
Research, 28, 301-309. http://dx.doi.org/10.1177/0049124100028003003 |
| [38] |
Huang, R. and Carriere, K.C.
(2006) Comparison of Methods for Incomplete Repeated Measures Data
Analysis in Small Samples. Journal of Statistical Planning and
Inference, 136, 235-247.
http://dx.doi.org/10.1016/j.jspi.2004.06.005 |
| [39] |
Unnebrink, K. and Windeler, J.
(2001) Intention-to-Treat: Methods for Dealing with Missing Values in
Clinical Trials of Progressively Deteriorating Diseases. Statistics in
Medicine, 20, 3931-3946. http://dx.doi.org/10.1002/sim.1149 |
| [40] |
Halabi, S., Wun, C.C. and Davis,
B.R. (2003) Analysis of Survival Data with Missing Measurements of a
Time-Dependent Binary Covariate. Journal of Biopharmaceutical
Statistics, 13, 253-270.
http://dx.doi.org/10.1081/BIP-120019270 |
| [41] |
Kenward, M.G. and Molenberghs,
G. (2009) Last Observation Carried Forward: A Crystal Ball? Journal of
Biopharmaceutical Statistics, 19, 872-888. http://dx.doi.org/10.1080/10543400903105406 |
评论
发表评论