Research Article
Open Access Peer-reviewed

Effects of Random Sampling Methods on Maximum Likelihood Estimates of a Simple Logistic Regression Model

Oshada Senaweera1, 2,, Prasanna S. Haddela1, Gayan Dharmarathne2

1Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka

2Department of Statistics, University of Colombo, Colombo, Sri Lanka

American Journal of Applied Mathematics and Statistics. 2021, 9(1), 28-37. DOI: 10.12691/ajams-9-1-5
Received December 21, 2020; Revised January 23, 2021; Accepted January 31, 2021

Abstract

The paper investigates the comparative effects of several random sampling methods on the maximum likelihood estimates of a simple logistic regression model. The study uses simulated data (logistic populations with pre-defined parameter values) that used Monte Carlo methods to simulate. Sampling techniques include Simple Random Sampling (SRS) and six variations of Stratified Sampling where two are single-stage Stratified Sampling and four are choice-based (two-phase) Stratified Sampling. Parameter estimates arising under each sampling technique were compared using performance measures Bias, Standard Error & Percentage of models that are feasibly estimated. The simulation-based analysis found that choice-based sampling with proportional allocation in both phases is the best-suited sampling technique for parameter estimation of a simple logistic regression model.

Keywords:

Monte-Carlo simulations, random sampling, logistic regression, maximum likelihood estimates
[1]  Amemiya, T., “The n-2-Order Mean Squared Errors of the Maximum Likelihood and the Minimum Logit Chi-Square Estimator”, The Annals of Statistics, 8 (3), 488-505, 1980.View Article
 
[2]  Gordon, D.V., Lin, Z., Osberg, L. and Phipps, S., “Predicting Probabilities: Inherent and Sampling Variability in the Estimation of Discrete-Choice Models”, Oxford Bulletin of Economics and Statistics, 56 (1), 13-31, 1994.View Article
 
[3]  Whittemore, A.S., “Sample Size for Logistic Regression with Small Response Probability”, Journal of the American Statistical Association, 76 (373), 27-32, 1981.View Article
 
[4]  Hsieh, F.Y., “Sample size tables for logistic regression”, Statistics in medicine, 8 (7), 795-802, 1989.View Article  PubMed
 
[5]  Breslow, N. E., and Chatterjee, N., “Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis”, Journal of the Royal Statistical Society: Series C (Applied Statistics), 48 (4), 457-468, 1999.View Article
 
[6]  Giles, J. A., and Courchane, M. J., “Stratified sample design for fair lending binary logit models”, Department of Economics, University of Victoria, 2000.
 
[7]  Dietrich, J., “The effects of sampling strategies on the small sample properties of the logit estimator”, Journal of Applied Statistics, 32 (6), 543-554, 2005.View Article
 
[8]  Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., and Feinstein, A. R., “A simulation study of the number of events per variable in logistic regression analysis”, Journal of clinical epidemiology, 49 (12), 1373-1379, 1996.View Article
 
[9]  Schaefer, R. L., “Alternative estimators in logistic regression when the data are collinear”, Journal of Statistical Computation and Simulation, 25 (1-2), 75-91, 1986.View Article
 
[10]  Albert, A. and Anderson, J.A., “On the existence of maximum likelihood estimates in logistic regression models”, Biometrika, 71 (1), 1-10, 1984.View Article