This paper addresses variable selection problems in sparse logistic regression model with errors-in-covariates. We propose a corrected score Lasso method, which combines the weighted score Lasso approach with a projected gradient descent algorithm, to handle the challenges posed by measurement errors. The weighted score Lasso introduces a correction-amenable score function, enabling direct extension to measurement error scenarios through subsequent score correction. Our method bridges the gap between rigorous measurement error correction and practical high-dimensional implementation, establishing a framework extensible to other generalized linear models with exponential family structure. Numerical studies demonstrate the superior performance of the corrected score Lasso in error correction scenarios, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.
Within the modern framework of statistical inference, the logistic regression model serves as the theoretical cornerstone for binary classification analysis, with its canonical form defined as:
![]() | (1) |
where the independent Bernoulli response variable
, associates with deterministic covariates
, through an unknown
-sparse parameter vector
(i.e.,
has only
non-zero components). The focus of the research is on statistical inference in high-dimensional settings (
), which essentially constitutes a variable selection (or model selection) problem.
To address the issue of variable selection, a variety of well-established regularization methods have been developed. The evolution of regularization methods, which originated from the Lasso framework 1, has given rise to advancements such as non - convex penalties in SCAD 2 , the elastic net 3, group Lasso 4, and the Dantzig selector 5. These methodologies are extensively detailed in the authoritative monograph 6.
Moreover, adaptations of these methodologies to sparse logistic regression models have garnered considerable interest. Notably, Zou 7 introduced the adaptive Lasso, while Loh and Wainwright 8 explored regularized M-estimators incorporating non-convex penalties like SCAD, MCP, and capped
norms. Yin 9 presented the weighted score Lasso method, and Zhong et al. 10 proposed a penalized weighted score function approach to tackle group sparsity in data. In the context of robust estimation, Cornilly et al. 11 put forward a method based on the elastic net framework, and Basu et al. 12 developed a robust estimation technique for adaptive Lasso.
However, an implicit assumption in all the work mentioned so far is that the covariates are directly observable without errors. This assumption, however, is seldom true in many practical studies. Common examples include sensor network data 13, gene expression microarray data 14 and high-throughput sequencing data 15. In such cases, the unobserved covariate
is related to the observed covariate
through an additive measurement error model
![]() | (2) |
where
represents the measurement error. In classical regression models, simply replacing
with
leads to a naive estimator, which is well-known to be inconsistent and biased (see 16 for a comprehensive review). Consequently, the variable selection methods discussed earlier are no longer valid in the presence of measurement error.
To mitigate the impact of measurement error, several correction methods for variable selection have been proposed in the literature. These include the Matrix Uncertainty (MU) selector 17, the improved MU selector 18, the regularized M-estimators 19, the orthogonal matching pursuit algorithm 20, the Convex Conditioned Lasso (CoCoLasso) 21 and the Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm 22 have been studied in the literature. However, most of these methods focus on linear models with measurement error. For high-dimensional generalized linear models, Sørensen et al. 23 and Sørensen et al. 24 proposed the generalized matrix uncertainly selector and the conditional score Lasso respectively. Chen 25 developed BOOME, a Boosting algorithm for measurement error in logistic regression and probit models. In the latter part of this paper, we propose a corrected score Lasso method to study sparse logistic regression with errors in covariates, leveraging the weighted score Lasso proposed by Yin 9 and the projected gradient descent algorithm suggested by Loh and Wainwright 19.
The remainder of this paper is organized as follows. In Section 2, we review the weighted score Lasso for sparse logistic regression without errors in covariates. Subsequently, in Section 3, we extend this approach to sparse logistic regression with errors in covariates and introduce our corrected score Lasso method. Section 4 presents numerical results obtained through simulations.
Notation: Write
,
and
. Denote by
the non-zero coordinate of
and let
be the number of non-zero elements of
. For a vector
, we define
and
. For function
, we denote by
its gradient at
.
![]() |
The sparse logistic regression model can be obtained through the
-penalized population loss minimization:
![]() | (3) |
where
serves as a tuning parameter. The solution to this Lasso problem satisfies the Karush-Kuhn-Tucker (KKT) conditions:
![]() |
where
represents the subgradient of
which is the sign of
if
and can be any value belonging to [0, 1], 1 when
. A critical challenge arises in parameter selection: the optimal
must satisfy
for some constant
. However, the score function valued at
:
![]() |
contains a random component
with variance
, introducing dependence on the true value
(unknown) in
selection.
To address this fundamental difficulty, Yin 9 proposed a weighted score Lasso approach through innovative weighting of the score function components. The weighted score Lasso solves:
![]() |
where
is a predefined positive weight function. The specific weight function:
![]() | (4) |
yields the weighted score function:
![]() | (5) |
This formulation leads to the weighted score Lasso estimator, defined as the solution to
![]() | (6) |
where the weighted loss function takes the form:
![]() |
A key advantage of this weighted approach lies in its correction-amenable score function, enabling direct extension to measurement error scenarios through subsequent score correction. The weighted formulation maintains convexity while decoupling the
selection from dependence on
, resolving the technical challenge present in standard logistic regression Lasso.
Building upon the weighted score framework, we now address the critical challenge of covariate measurement error through innovative methodological extensions. Consider the logistic regression model (1) augmented with an additive measurement error structure:
![]() | (7) |
where
represents error-contaminated covariates,
follows Normal
. Direct substitution of
with
in standard Lasso implementations produces biased naive estimators—a well-documented phenomenon in measurement error literature 16. This bias persists in high-dimensional settings, necessitating specialized correction mechanisms.
![]() |
Since the score is unbiased, it follows that the corrected score, if it exists, is also unbiased. Therefore, corrected scores yield consistent estimators. However, Stefanski 26 established the non-existence of conventional corrected scores for logistic regression, our weighted score formulation enables correction through strategic exploitation of the measurement error structure.
Given a corrected score function
, it is natural to consider the following optimization problem:
![]() | (8) |
Although the above solution seems very natural, the optimization program (8) is fundamentally different from the optimization program (6). When the dataset is corrupted by measurement errors,
is not usually convex. Hence the difference is, the optimization program (8) is nonconvex, the optimization program (6) is convex. To overcome this difficulty, we extend the algorithm proposed by Loh and Wainwright 19 for linear models with measurement error, to model (8).
Under Gaussian measurement errors, we establish the pivotal moment relationships:
![]() |
![]() |
Due to the conditional independence of
and
given
, we can obtain a correction score function based on (5), which has the form
![]() | (9) |
The corresponding corrected loss function becomes:
![]() |
This formulation preserves the essential unbiasedness property
under perfect covariate observation. When
, the estimator for
can be obtained by minimizing
using the standard gradient descent method. However, when
, without addition regularization, optimizing
is an ill-posed mathematical problem because it does not have a unique solution.
To address the dual challenges of non-convex optimization and high-dimensionality (
), we adapt the projected gradient descent framework 19 to our corrected score context. The proposed corrected score Lasso estimator solves:
![]() | (10) |
where
constrains the parameter space to ensure feasibility of
. The projection step onto the
-ball
combines naturally with
-regularization to enforce sparsity while maintaining computational tractability. The projected gradient descent algorithm generates a sequence
of iterates by the recursion:
![]() |
where
denotes projection onto the
-ball of radius
, and
is the step size parameter. As Loh and Wainwright 19 showed, if
is properly chosen, the above iterates converge with high probability to a vector extremely close to any global minimizers of the program (10).
In this section we use simulated datasets to investigate the finite sample performances of proposed procedures. The performance of estimator
is assessed by the Mean Absolute Error (MAE)
![]() |
To assess the ability of variable selection methods for recovering the varying sparsity, we record “TP” and “FP” that denote the number of true positives and the number of false positives, respectively.
We generate 100 datasets from model (1), each consisting of n=200 observations. We set
and
, for
or 1000, respectively. The matrix
had i.i.d. rows
, for
. We consider two model for covariance matrix
, component independent (
) and autoregressive (
). The response
is binomially distributed with mean
. A measurement matrix
is generated with the rows of
i.i.d. distributed Normal
where
or
.
We use R package “glmnet” to solve program (3), where two tuning parameters are chosen via 10-fold cross-validation: one (denote
) corresponding to the minimum deviance, and one (denote
) corresponding to the “one-standard-error” rule 27. We apply weighted score Lasso estimate (denoted as “WsLasso”) to solve programm (6), where the settings of
,
and
refer to the paper by 9. For comparison, we consider “glmnet” and “WsLasso” estimators based on corrupted data
. Our corrected score Lasso estimator (denoted as “CsLasso”) can be computed using an efficient projection algorithm proposed by 28. CsLasso requires an initial estimator
which was given by glmnet (with
) estimate based on
. CsLasso also requires knowledge of
or a loss function for choosing the constraint parameter
. Since,
is impossible to know beforehand and CsLasso lacks a well-defined loss function, the elbow rule 17 is used to select the constraint parameter
from 39 equally spaced values in
. Figure 1 shows the number of nonzero coefficients plotted against the value of
, and the eblow rule now amounts to selecting
where the curve begins to flatten.
Table 1, Table 2, Table 3 and Table 4 offer a systematic quantification of the error correction superiority of CsLasso over both glmnet (with
and
) and WsLasso. The comparative analysis reveals three critical advantages:
1. Enhanced Selection Accuracy: CsLasso achieves higher true positives (TPs) than glmnet with
while simultaneously reducing false positives (FPs). Although WsLasso demonstrates the lowest FPs, its substantially compromised TPs reveal excessive conservatism. In contrast, CsLasso maintains greater detection power without sacrificing specificity.
2. Measurement Error Resilience: Under escalating measurement error (
from 0.2 to 0.5), CsLasso exhibits superior stability with less TPs reduction compared to glmnet and WsLasso. While greater
leads to a smaller TP for all methods, CsLasso shows only a slight reduction in TP and an increase in FP. This robustness stems from CsLasso's integrated measurement error correction with its optimization framework.
3. Estimation Precision Dominance: CsLasso has the lowest MAE in almost all cases. In high-error conditions (
), this advantage becomes even more pronounced, with CsLasso's MAE being significantly lower than those of glmnet and WsLasso.
These findings collectively demonstrate the superior performance of CsLasso in error correction scenarios, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.
This paper focuses on variable selection in sparse logistic regression model when covariates are subject to measurement error. The key contribution of this study is the development of the corrected score Lasso (CsLasso) method, which effectively addresses the challenges posed by measurement errors in covariates.
The CsLasso method builds upon the weighted score Lasso framework, introducing a correction-amenable score function that allows for direct extension to measurement error scenarios through subsequent score correction. This approach maintains convexity while decoupling the regularization parameter selection from dependence on the true parameter value, thereby resolving a technical challenge present in standard logistic regression Lasso methods.
The numerical studies conducted in this paper demonstrate the superior performance of CsLasso in error correction scenarios. CsLasso achieves higher true positives (TPs) than glmnet with
while simultaneously reducing false positives (FPs). It also exhibits superior stability under escalating measurement error, with less reduction in TPs compared to glmnet and WsLasso. Furthermore, CsLasso consistently achieves the lowest Mean Absolute Error (MAE) in almost all cases, with this advantage becoming even more pronounced in high-error conditions.
The methodology proposed in this paper bridges the gap between rigorous measurement error correction and practical high-dimensional implementation. It establishes a framework that is extensible to other generalized linear models with exponential family structure, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.
Future research could explore the application of the CsLasso method to other types of generalized linear models and further investigate its theoretical properties in ultra-high dimensional settings. Additionally, the development of more efficient algorithms for implementing CsLasso in large-scale data analysis could be a valuable direction for future work.
The authors’ work was supported by the Educational Commission of Jiangxi Province of China (GJJ211403).
| [1] | Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996. | ||
| In article | View Article | ||
| [2] | Fan, J. and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 (456), 1348-1360, 2001. | ||
| In article | View Article | ||
| [3] | Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2), 301-320, 2005. | ||
| In article | View Article | ||
| [4] | Yuan, M. and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67, 2006. | ||
| In article | View Article | ||
| [5] | Candes, E. and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351, 2007. | ||
| In article | View Article | ||
| [6] | Bühlmann, P. and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011. | ||
| In article | View Article | ||
| [7] | Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418-1429, 2006. | ||
| In article | View Article | ||
| [8] | Loh, P.-L. and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pp. 476-484, 2013. | ||
| In article | |||
| [9] | Yin, Z. Variable selection for sparse logistic regression. Metrika 83 (7), 821-836, 2020. | ||
| In article | View Article | ||
| [10] | Zhong, M., Z. Yin, and Z. Wang. Variable selection for sparse logistic regression with grouped variables. Mathematics 11 (24), 4979, 2023. | ||
| In article | View Article | ||
| [11] | Cornilly, D., L. Tubex, S. Van Aelst, and T. Verdonck. Robust and sparse logistic regression. Advances in Data Analysis and Classification 18 (3), 663-679, 2024. | ||
| In article | View Article | ||
| [12] | Basu, A., A. Ghosh, M. Jaenada, and L. Pardo. Robust adaptive lasso in high-dimensional logistic regression. Statistical Methods & Applications 33 (5), 1217-1249, 2024. | ||
| In article | View Article | ||
| [13] | Feng, J., S. Megerian, and M. Potkonjak. Model-based calibration for sensor networks. In Sensors, Proceedings of IEEE, Volume 2, pp. 737-742, 2003. | ||
| In article | View Article | ||
| [14] | Purdom, E. and S. P. Holmes. Error distribution for gene expression data. Statistical applications in genetics and molecular biology 4 (1), 2005. | ||
| In article | View Article PubMed | ||
| [15] | Benjamini, Y. and T. P. Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic acids research 40 (10), e72-e72, 2012. | ||
| In article | View Article PubMed | ||
| [16] | Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006. | ||
| In article | View Article | ||
| [17] | Rosenbaum, M., A. B. Tsybakov, et al. Sparse recovery under matrix uncertainty. The Annals of Statistics 38 (5), 2620-2651, 2010. | ||
| In article | View Article | ||
| [18] | Rosenbaum, M. and A. B. Tsybakov. Improved matrix uncertainty selector. 9, 276-291, 2013. | ||
| In article | View Article | ||
| [19] | Loh, P.-L. and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Advances in Neural Information Processing Systems, pp. 2726-2734, 2011. | ||
| In article | |||
| [20] | Chen, Y. and C. Caramanis. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012. | ||
| In article | |||
| [21] | Datta, A. and H. Zou. Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics 45 (6), 2400-2426, 2017. | ||
| In article | View Article | ||
| [22] | Escribe, C., T. Lu, J. Keller-Baruch, V. Forgetta, B. Xiao, J. B. Richards, S. Bhatnagar, K. Oualkacha, and C. M. Greenwood. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 45 (8), 874-890, 2021. | ||
| In article | View Article PubMed | ||
| [23] | Sørensen, Ø., A. Frigessi, and M. Thoresen. Measurement error in lasso: Impact and likelihood bias correction. Statistica Sinica, 809-829, 2015. | ||
| In article | View Article | ||
| [24] | Sørensen, Ø., K. H. Hellton, A. Frigessi, and M. Thoresen. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 27 (4), 739-749, 2018. | ||
| In article | View Article | ||
| [25] | Chen, L.-P. Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics 33 (2), 407-420, 2024. | ||
| In article | View Article | ||
| [26] | Stefanski, L. A. Unbiased estimation of a nonlinear function a normal mean with application to measurement error of models. Communications in Statistics-Theory and Methods 18 (12), 4335-4358, 1989. | ||
| In article | View Article | ||
| [27] | Friedman, J., T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33 (1), 1-22, 2010. | ||
| In article | View Article PubMed | ||
| [28] | Duchi, J., S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272-279, 2008. ACM. | ||
| In article | View Article | ||
Published with license by Science and Education Publishing, Copyright © 2025 Zanhua Yin and Zhichao Wang
This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
| [1] | Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996. | ||
| In article | View Article | ||
| [2] | Fan, J. and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 (456), 1348-1360, 2001. | ||
| In article | View Article | ||
| [3] | Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2), 301-320, 2005. | ||
| In article | View Article | ||
| [4] | Yuan, M. and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67, 2006. | ||
| In article | View Article | ||
| [5] | Candes, E. and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351, 2007. | ||
| In article | View Article | ||
| [6] | Bühlmann, P. and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011. | ||
| In article | View Article | ||
| [7] | Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418-1429, 2006. | ||
| In article | View Article | ||
| [8] | Loh, P.-L. and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pp. 476-484, 2013. | ||
| In article | |||
| [9] | Yin, Z. Variable selection for sparse logistic regression. Metrika 83 (7), 821-836, 2020. | ||
| In article | View Article | ||
| [10] | Zhong, M., Z. Yin, and Z. Wang. Variable selection for sparse logistic regression with grouped variables. Mathematics 11 (24), 4979, 2023. | ||
| In article | View Article | ||
| [11] | Cornilly, D., L. Tubex, S. Van Aelst, and T. Verdonck. Robust and sparse logistic regression. Advances in Data Analysis and Classification 18 (3), 663-679, 2024. | ||
| In article | View Article | ||
| [12] | Basu, A., A. Ghosh, M. Jaenada, and L. Pardo. Robust adaptive lasso in high-dimensional logistic regression. Statistical Methods & Applications 33 (5), 1217-1249, 2024. | ||
| In article | View Article | ||
| [13] | Feng, J., S. Megerian, and M. Potkonjak. Model-based calibration for sensor networks. In Sensors, Proceedings of IEEE, Volume 2, pp. 737-742, 2003. | ||
| In article | View Article | ||
| [14] | Purdom, E. and S. P. Holmes. Error distribution for gene expression data. Statistical applications in genetics and molecular biology 4 (1), 2005. | ||
| In article | View Article PubMed | ||
| [15] | Benjamini, Y. and T. P. Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic acids research 40 (10), e72-e72, 2012. | ||
| In article | View Article PubMed | ||
| [16] | Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006. | ||
| In article | View Article | ||
| [17] | Rosenbaum, M., A. B. Tsybakov, et al. Sparse recovery under matrix uncertainty. The Annals of Statistics 38 (5), 2620-2651, 2010. | ||
| In article | View Article | ||
| [18] | Rosenbaum, M. and A. B. Tsybakov. Improved matrix uncertainty selector. 9, 276-291, 2013. | ||
| In article | View Article | ||
| [19] | Loh, P.-L. and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Advances in Neural Information Processing Systems, pp. 2726-2734, 2011. | ||
| In article | |||
| [20] | Chen, Y. and C. Caramanis. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012. | ||
| In article | |||
| [21] | Datta, A. and H. Zou. Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics 45 (6), 2400-2426, 2017. | ||
| In article | View Article | ||
| [22] | Escribe, C., T. Lu, J. Keller-Baruch, V. Forgetta, B. Xiao, J. B. Richards, S. Bhatnagar, K. Oualkacha, and C. M. Greenwood. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 45 (8), 874-890, 2021. | ||
| In article | View Article PubMed | ||
| [23] | Sørensen, Ø., A. Frigessi, and M. Thoresen. Measurement error in lasso: Impact and likelihood bias correction. Statistica Sinica, 809-829, 2015. | ||
| In article | View Article | ||
| [24] | Sørensen, Ø., K. H. Hellton, A. Frigessi, and M. Thoresen. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 27 (4), 739-749, 2018. | ||
| In article | View Article | ||
| [25] | Chen, L.-P. Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics 33 (2), 407-420, 2024. | ||
| In article | View Article | ||
| [26] | Stefanski, L. A. Unbiased estimation of a nonlinear function a normal mean with application to measurement error of models. Communications in Statistics-Theory and Methods 18 (12), 4335-4358, 1989. | ||
| In article | View Article | ||
| [27] | Friedman, J., T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33 (1), 1-22, 2010. | ||
| In article | View Article PubMed | ||
| [28] | Duchi, J., S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272-279, 2008. ACM. | ||
| In article | View Article | ||