Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Zanhua Yin; Zhichao Wang

doi:10.12691/ajams-13-2-1

Article Versions

Export Article

Cite this article

Normal Style
MLA Style
APA Style
Chicago Style

Research Article

Open Access Peer-reviewed

Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Zanhua Yin, Zhichao Wang

American Journal of Applied Mathematics and Statistics. 2025, 13(2), 24-29. DOI: 10.12691/ajams-13-2-1

Received March 01, 2025; Revised April 01, 2025; Accepted April 08, 2025

Abstract

This paper addresses variable selection problems in sparse logistic regression model with errors-in-covariates. We propose a corrected score Lasso method, which combines the weighted score Lasso approach with a projected gradient descent algorithm, to handle the challenges posed by measurement errors. The weighted score Lasso introduces a correction-amenable score function, enabling direct extension to measurement error scenarios through subsequent score correction. Our method bridges the gap between rigorous measurement error correction and practical high-dimensional implementation, establishing a framework extensible to other generalized linear models with exponential family structure. Numerical studies demonstrate the superior performance of the corrected score Lasso in error correction scenarios, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.

Keywords: Corrected score Lasso logistic regression model measurement error sparse

1. Introduction

Within the modern framework of statistical inference, the logistic regression model serves as the theoretical cornerstone for binary classification analysis, with its canonical form defined as:

(1)

where the independent Bernoulli response variable , associates with deterministic covariates, through an unknown-sparse parameter vector(i.e., has only non-zero components). The focus of the research is on statistical inference in high-dimensional settings (), which essentially constitutes a variable selection (or model selection) problem.

To address the issue of variable selection, a variety of well-established regularization methods have been developed. The evolution of regularization methods, which originated from the Lasso framework ¹, has given rise to advancements such as non - convex penalties in SCAD ² , the elastic net ³, group Lasso ⁴, and the Dantzig selector ⁵. These methodologies are extensively detailed in the authoritative monograph ⁶.

Moreover, adaptations of these methodologies to sparse logistic regression models have garnered considerable interest. Notably, Zou ⁷ introduced the adaptive Lasso, while Loh and Wainwright ⁸ explored regularized M-estimators incorporating non-convex penalties like SCAD, MCP, and cappednorms. Yin ⁹ presented the weighted score Lasso method, and Zhong et al. ¹⁰ proposed a penalized weighted score function approach to tackle group sparsity in data. In the context of robust estimation, Cornilly et al. ¹¹ put forward a method based on the elastic net framework, and Basu et al. ¹² developed a robust estimation technique for adaptive Lasso.

However, an implicit assumption in all the work mentioned so far is that the covariates are directly observable without errors. This assumption, however, is seldom true in many practical studies. Common examples include sensor network data ¹³, gene expression microarray data ¹⁴ and high-throughput sequencing data ¹⁵. In such cases, the unobserved covariate is related to the observed covariatethrough an additive measurement error model

(2)

where represents the measurement error. In classical regression models, simply replacing with leads to a naive estimator, which is well-known to be inconsistent and biased (see ¹⁶ for a comprehensive review). Consequently, the variable selection methods discussed earlier are no longer valid in the presence of measurement error.

To mitigate the impact of measurement error, several correction methods for variable selection have been proposed in the literature. These include the Matrix Uncertainty (MU) selector ¹⁷, the improved MU selector ¹⁸, the regularized M-estimators ¹⁹, the orthogonal matching pursuit algorithm ²⁰, the Convex Conditioned Lasso (CoCoLasso) ²¹ and the Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm ²² have been studied in the literature. However, most of these methods focus on linear models with measurement error. For high-dimensional generalized linear models, Sørensen et al. ²³ and Sørensen et al. ²⁴ proposed the generalized matrix uncertainly selector and the conditional score Lasso respectively. Chen ²⁵ developed BOOME, a Boosting algorithm for measurement error in logistic regression and probit models. In the latter part of this paper, we propose a corrected score Lasso method to study sparse logistic regression with errors in covariates, leveraging the weighted score Lasso proposed by Yin ⁹ and the projected gradient descent algorithm suggested by Loh and Wainwright ¹⁹.

The remainder of this paper is organized as follows. In Section 2, we review the weighted score Lasso for sparse logistic regression without errors in covariates. Subsequently, in Section 3, we extend this approach to sparse logistic regression with errors in covariates and introduce our corrected score Lasso method. Section 4 presents numerical results obtained through simulations.

Notation: Write , and . Denote by the non-zero coordinate of and letbe the number of non-zero elements of . For a vector , we define and . For function , we denote by its gradient at .

2. Weighted Score Lasso

The sparse logistic regression model can be obtained through the -penalized population loss minimization:

(3)

where serves as a tuning parameter. The solution to this Lasso problem satisfies the Karush-Kuhn-Tucker (KKT) conditions:

where represents the subgradient ofwhich is the sign of if and can be any value belonging to [0, 1]^{, 1} when . A critical challenge arises in parameter selection: the optimalmust satisfy for some constant . However, the score function valued at :

contains a random component with variance , introducing dependence on the true value (unknown) in selection.

To address this fundamental difficulty, Yin ⁹ proposed a weighted score Lasso approach through innovative weighting of the score function components. The weighted score Lasso solves:

where is a predefined positive weight function. The specific weight function:

(4)

yields the weighted score function:

(5)

This formulation leads to the weighted score Lasso estimator, defined as the solution to

(6)

where the weighted loss function takes the form:

A key advantage of this weighted approach lies in its correction-amenable score function, enabling direct extension to measurement error scenarios through subsequent score correction. The weighted formulation maintains convexity while decoupling theselection from dependence on, resolving the technical challenge present in standard logistic regression Lasso.

3. Corrected Score Lasso

Building upon the weighted score framework, we now address the critical challenge of covariate measurement error through innovative methodological extensions. Consider the logistic regression model (1) augmented with an additive measurement error structure:

(7)

where represents error-contaminated covariates, follows Normal. Direct substitution of with in standard Lasso implementations produces biased naive estimators—a well-documented phenomenon in measurement error literature ¹⁶. This bias persists in high-dimensional settings, necessitating specialized correction mechanisms.

Since the score is unbiased, it follows that the corrected score, if it exists, is also unbiased. Therefore, corrected scores yield consistent estimators. However, Stefanski ²⁶ established the non-existence of conventional corrected scores for logistic regression, our weighted score formulation enables correction through strategic exploitation of the measurement error structure.

Given a corrected score function , it is natural to consider the following optimization problem:

(8)

Although the above solution seems very natural, the optimization program (8) is fundamentally different from the optimization program (6). When the dataset is corrupted by measurement errors, is not usually convex. Hence the difference is, the optimization program (8) is nonconvex, the optimization program (6) is convex. To overcome this difficulty, we extend the algorithm proposed by Loh and Wainwright ¹⁹ for linear models with measurement error, to model (8).

Under Gaussian measurement errors, we establish the pivotal moment relationships:

Due to the conditional independence of and given , we can obtain a correction score function based on (5), which has the form

(9)

The corresponding corrected loss function becomes:

This formulation preserves the essential unbiasedness property under perfect covariate observation. When, the estimator for can be obtained by minimizingusing the standard gradient descent method. However, when , without addition regularization, optimizingis an ill-posed mathematical problem because it does not have a unique solution.

To address the dual challenges of non-convex optimization and high-dimensionality (), we adapt the projected gradient descent framework ¹⁹ to our corrected score context. The proposed corrected score Lasso estimator solves:

(10)

whereconstrains the parameter space to ensure feasibility of. The projection step onto the -ball combines naturally with -regularization to enforce sparsity while maintaining computational tractability. The projected gradient descent algorithm generates a sequenceof iterates by the recursion:

wheredenotes projection onto the-ball of radius, andis the step size parameter. As Loh and Wainwright ¹⁹ showed, ifis properly chosen, the above iterates converge with high probability to a vector extremely close to any global minimizers of the program (10).

4. Numerical Studies

In this section we use simulated datasets to investigate the finite sample performances of proposed procedures. The performance of estimator is assessed by the Mean Absolute Error (MAE)

To assess the ability of variable selection methods for recovering the varying sparsity, we record “TP” and “FP” that denote the number of true positives and the number of false positives, respectively.

We generate 100 datasets from model (1), each consisting of n=200 observations. We setand , foror 1000, respectively. The matrixhad i.i.d. rows, for. We consider two model for covariance matrix, component independent () and autoregressive (). The responseis binomially distributed with mean. A measurement matrix is generated with the rows of i.i.d. distributed Normalwhere or .

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
Figure 1. The elbow rule for corrected score Lasso, when covariance matrix ∑_xhas entries∑_x,jk = i{j=k}, j,k=1,...,pand σ_u² = 0.2 with n=200 and p=500

We use R package “glmnet” to solve program (3), where two tuning parameters are chosen via 10-fold cross-validation: one (denote) corresponding to the minimum deviance, and one (denote ) corresponding to the “one-standard-error” rule ²⁷. We apply weighted score Lasso estimate (denoted as “WsLasso”) to solve programm (6), where the settings of,andrefer to the paper by ⁹. For comparison, we consider “glmnet” and “WsLasso” estimators based on corrupted data . Our corrected score Lasso estimator (denoted as “CsLasso”) can be computed using an efficient projection algorithm proposed by ²⁸. CsLasso requires an initial estimatorwhich was given by glmnet (with) estimate based on. CsLasso also requires knowledge ofor a loss function for choosing the constraint parameter . Since, is impossible to know beforehand and CsLasso lacks a well-defined loss function, the elbow rule ¹⁷ is used to select the constraint parameterfrom 39 equally spaced values in . Figure 1 shows the number of nonzero coefficients plotted against the value of, and the eblow rule now amounts to selectingwhere the curve begins to flatten.

Table 1, Table 2, Table 3 and Table 4 offer a systematic quantification of the error correction superiority of CsLasso over both glmnet (withand) and WsLasso. The comparative analysis reveals three critical advantages:

1. Enhanced Selection Accuracy: CsLasso achieves higher true positives (TPs) than glmnet withwhile simultaneously reducing false positives (FPs). Although WsLasso demonstrates the lowest FPs, its substantially compromised TPs reveal excessive conservatism. In contrast, CsLasso maintains greater detection power without sacrificing specificity.

Table 1. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=500. Reported numbers are the averages and standard errors (show in parentheses)
Tables index
View option
Full Size Next Table

2. Measurement Error Resilience: Under escalating measurement error (from 0.2 to 0.5), CsLasso exhibits superior stability with less TPs reduction compared to glmnet and WsLasso. While greaterleads to a smaller TP for all methods, CsLasso shows only a slight reduction in TP and an increase in FP. This robustness stems from CsLasso's integrated measurement error correction with its optimization framework.

Table 2. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)
Tables index
View option
Full Size Previous Table Next Table

3. Estimation Precision Dominance: CsLasso has the lowest MAE in almost all cases. In high-error conditions (), this advantage becomes even more pronounced, with CsLasso's MAE being significantly lower than those of glmnet and WsLasso.

Table 3. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I,p=500. Reported numbers are the averages and standard errors (show in parentheses)
Tables index
View option
Full Size Previous Table Next Table

These findings collectively demonstrate the superior performance of CsLasso in error correction scenarios, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.

Table 4. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)
Tables index
View option
Full Size Previous Table

5. Conclusion

This paper focuses on variable selection in sparse logistic regression model when covariates are subject to measurement error. The key contribution of this study is the development of the corrected score Lasso (CsLasso) method, which effectively addresses the challenges posed by measurement errors in covariates.

The CsLasso method builds upon the weighted score Lasso framework, introducing a correction-amenable score function that allows for direct extension to measurement error scenarios through subsequent score correction. This approach maintains convexity while decoupling the regularization parameter selection from dependence on the true parameter value, thereby resolving a technical challenge present in standard logistic regression Lasso methods.

The numerical studies conducted in this paper demonstrate the superior performance of CsLasso in error correction scenarios. CsLasso achieves higher true positives (TPs) than glmnet with while simultaneously reducing false positives (FPs). It also exhibits superior stability under escalating measurement error, with less reduction in TPs compared to glmnet and WsLasso. Furthermore, CsLasso consistently achieves the lowest Mean Absolute Error (MAE) in almost all cases, with this advantage becoming even more pronounced in high-error conditions.

The methodology proposed in this paper bridges the gap between rigorous measurement error correction and practical high-dimensional implementation. It establishes a framework that is extensible to other generalized linear models with exponential family structure, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.

Future research could explore the application of the CsLasso method to other types of generalized linear models and further investigate its theoretical properties in ultra-high dimensional settings. Additionally, the development of more efficient algorithms for implementing CsLasso in large-scale data analysis could be a valuable direction for future work.

ACKNOWLEDGEMENTS

The authors’ work was supported by the Educational Commission of Jiangxi Province of China (GJJ211403).

References

[1]	Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.
	In article	View Article

[2]	Fan, J. and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 (456), 1348-1360, 2001.
	In article	View Article

[3]	Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2), 301-320, 2005.
	In article	View Article

[4]	Yuan, M. and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67, 2006.
	In article	View Article

[5]	Candes, E. and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351, 2007.
	In article	View Article

[6]	Bühlmann, P. and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
	In article	View Article

[7]	Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418-1429, 2006.
	In article	View Article

[8]	Loh, P.-L. and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pp. 476-484, 2013.
	In article

[9]	Yin, Z. Variable selection for sparse logistic regression. Metrika 83 (7), 821-836, 2020.
	In article	View Article

[10]	Zhong, M., Z. Yin, and Z. Wang. Variable selection for sparse logistic regression with grouped variables. Mathematics 11 (24), 4979, 2023.
	In article	View Article

[11]	Cornilly, D., L. Tubex, S. Van Aelst, and T. Verdonck. Robust and sparse logistic regression. Advances in Data Analysis and Classification 18 (3), 663-679, 2024.
	In article	View Article

[12]	Basu, A., A. Ghosh, M. Jaenada, and L. Pardo. Robust adaptive lasso in high-dimensional logistic regression. Statistical Methods & Applications 33 (5), 1217-1249, 2024.
	In article	View Article

[13]	Feng, J., S. Megerian, and M. Potkonjak. Model-based calibration for sensor networks. In Sensors, Proceedings of IEEE, Volume 2, pp. 737-742, 2003.
	In article	View Article

[14]	Purdom, E. and S. P. Holmes. Error distribution for gene expression data. Statistical applications in genetics and molecular biology 4 (1), 2005.
	In article	View Article PubMed

[15]	Benjamini, Y. and T. P. Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic acids research 40 (10), e72-e72, 2012.
	In article	View Article PubMed

[16]	Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006.
	In article	View Article

[17]	Rosenbaum, M., A. B. Tsybakov, et al. Sparse recovery under matrix uncertainty. The Annals of Statistics 38 (5), 2620-2651, 2010.
	In article	View Article

[18]	Rosenbaum, M. and A. B. Tsybakov. Improved matrix uncertainty selector. 9, 276-291, 2013.
	In article	View Article

[19]	Loh, P.-L. and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Advances in Neural Information Processing Systems, pp. 2726-2734, 2011.
	In article

[20]	Chen, Y. and C. Caramanis. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012.
	In article

[21]	Datta, A. and H. Zou. Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics 45 (6), 2400-2426, 2017.
	In article	View Article

[22]	Escribe, C., T. Lu, J. Keller-Baruch, V. Forgetta, B. Xiao, J. B. Richards, S. Bhatnagar, K. Oualkacha, and C. M. Greenwood. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 45 (8), 874-890, 2021.
	In article	View Article PubMed

[23]	Sørensen, Ø., A. Frigessi, and M. Thoresen. Measurement error in lasso: Impact and likelihood bias correction. Statistica Sinica, 809-829, 2015.
	In article	View Article

[24]	Sørensen, Ø., K. H. Hellton, A. Frigessi, and M. Thoresen. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 27 (4), 739-749, 2018.
	In article	View Article

[25]	Chen, L.-P. Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics 33 (2), 407-420, 2024.
	In article	View Article

[26]	Stefanski, L. A. Unbiased estimation of a nonlinear function a normal mean with application to measurement error of models. Communications in Statistics-Theory and Methods 18 (12), 4335-4358, 1989.
	In article	View Article

[27]	Friedman, J., T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33 (1), 1-22, 2010.
	In article	View Article PubMed

[28]	Duchi, J., S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272-279, 2008. ACM.
	In article	View Article

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Cite this article:

Normal Style

Zanhua Yin, Zhichao Wang. Variable Selection for Sparse Logistic Regression Model with Errors in Covariates. American Journal of Applied Mathematics and Statistics. Vol. 13, No. 2, 2025, pp 24-29. https://pubs.sciepub.com/ajams/13/2/1

MLA Style

Yin, Zanhua, and Zhichao Wang. "Variable Selection for Sparse Logistic Regression Model with Errors in Covariates." American Journal of Applied Mathematics and Statistics 13.2 (2025): 24-29.

APA Style

Yin, Z. , & Wang, Z. (2025). Variable Selection for Sparse Logistic Regression Model with Errors in Covariates. American Journal of Applied Mathematics and Statistics, 13(2), 24-29.

Chicago Style

Yin, Zanhua, and Zhichao Wang. "Variable Selection for Sparse Logistic Regression Model with Errors in Covariates." American Journal of Applied Mathematics and Statistics 13, no. 2 (2025): 24-29.

Like this article()

Figure 1. The elbow rule for corrected score Lasso, when covariance matrix ∑_xhas entries∑_x,jk = i{j=k}, j,k=1,...,pand σ_u² = 0.2 with n=200 and p=500
View in article
Full Size Figure

Table 1. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=500. Reported numbers are the averages and standard errors (show in parentheses)
View in article
Full Size

Table 2. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)
View in article
Full Size

Table 3. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I,p=500. Reported numbers are the averages and standard errors (show in parentheses)
View in article
Full Size

Table 4. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)
View in article
Full Size

[1]	Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.
	In article	View Article

[2]	Fan, J. and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 (456), 1348-1360, 2001.
	In article	View Article

[3]	Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2), 301-320, 2005.
	In article	View Article

[4]	Yuan, M. and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67, 2006.
	In article	View Article

[5]	Candes, E. and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351, 2007.
	In article	View Article

[6]	Bühlmann, P. and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
	In article	View Article

[7]	Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418-1429, 2006.
	In article	View Article

[8]	Loh, P.-L. and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pp. 476-484, 2013.
	In article

[9]	Yin, Z. Variable selection for sparse logistic regression. Metrika 83 (7), 821-836, 2020.
	In article	View Article

[10]	Zhong, M., Z. Yin, and Z. Wang. Variable selection for sparse logistic regression with grouped variables. Mathematics 11 (24), 4979, 2023.
	In article	View Article

[11]	Cornilly, D., L. Tubex, S. Van Aelst, and T. Verdonck. Robust and sparse logistic regression. Advances in Data Analysis and Classification 18 (3), 663-679, 2024.
	In article	View Article

[12]	Basu, A., A. Ghosh, M. Jaenada, and L. Pardo. Robust adaptive lasso in high-dimensional logistic regression. Statistical Methods & Applications 33 (5), 1217-1249, 2024.
	In article	View Article

[13]	Feng, J., S. Megerian, and M. Potkonjak. Model-based calibration for sensor networks. In Sensors, Proceedings of IEEE, Volume 2, pp. 737-742, 2003.
	In article	View Article

[14]	Purdom, E. and S. P. Holmes. Error distribution for gene expression data. Statistical applications in genetics and molecular biology 4 (1), 2005.
	In article	View Article PubMed

[15]	Benjamini, Y. and T. P. Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic acids research 40 (10), e72-e72, 2012.
	In article	View Article PubMed

[16]	Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006.
	In article	View Article

[17]	Rosenbaum, M., A. B. Tsybakov, et al. Sparse recovery under matrix uncertainty. The Annals of Statistics 38 (5), 2620-2651, 2010.
	In article	View Article

[18]	Rosenbaum, M. and A. B. Tsybakov. Improved matrix uncertainty selector. 9, 276-291, 2013.
	In article	View Article

[19]	Loh, P.-L. and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Advances in Neural Information Processing Systems, pp. 2726-2734, 2011.
	In article

[20]	Chen, Y. and C. Caramanis. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012.
	In article

[21]	Datta, A. and H. Zou. Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics 45 (6), 2400-2426, 2017.
	In article	View Article

[22]	Escribe, C., T. Lu, J. Keller-Baruch, V. Forgetta, B. Xiao, J. B. Richards, S. Bhatnagar, K. Oualkacha, and C. M. Greenwood. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 45 (8), 874-890, 2021.
	In article	View Article PubMed

[23]	Sørensen, Ø., A. Frigessi, and M. Thoresen. Measurement error in lasso: Impact and likelihood bias correction. Statistica Sinica, 809-829, 2015.
	In article	View Article

[24]	Sørensen, Ø., K. H. Hellton, A. Frigessi, and M. Thoresen. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 27 (4), 739-749, 2018.
	In article	View Article

[25]	Chen, L.-P. Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics 33 (2), 407-420, 2024.
	In article	View Article

[26]	Stefanski, L. A. Unbiased estimation of a nonlinear function a normal mean with application to measurement error of models. Communications in Statistics-Theory and Methods 18 (12), 4335-4358, 1989.
	In article	View Article

[27]	Friedman, J., T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33 (1), 1-22, 2010.
	In article	View Article PubMed

[28]	Duchi, J., S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272-279, 2008. ACM.
	In article	View Article

Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Abstract

1. Introduction

2. Weighted Score Lasso

3. Corrected Score Lasso

4. Numerical Studies

Table 1. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=500. Reported numbers are the averages and standard errors (show in parentheses)

Table 2. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)

Table 3. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I,p=500. Reported numbers are the averages and standard errors (show in parentheses)

Table 4. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)

5. Conclusion

ACKNOWLEDGEMENTS

References

Cite this article:

Normal Style

MLA Style

APA Style

Chicago Style

Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Abstract

1. Introduction

2. Weighted Score Lasso

3. Corrected Score Lasso

4. Numerical Studies

Table 1. Results from the simulation study, when covariance matrix ∑x has entries ∑x,jk = i{j=k}, j,k=1,...,p, and , ∑uu =σu2I, p=500. Reported numbers are the averages and standard errors (show in parentheses)

Table 2. Results from the simulation study, when covariance matrix ∑x has entries ∑x,jk = i{j=k}, j,k=1,...,p, and , ∑uu =σu2I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)

Table 3. Results from the simulation study, when covariance matrix ∑x has entries∑x,jk = 0.5|j-k|, j,k=1,...,p, and ∑uu =σu2I,p=500. Reported numbers are the averages and standard errors (show in parentheses)

Table 4. Results from the simulation study, when covariance matrix ∑x has entries∑x,jk = 0.5|j-k|, j,k=1,...,p, and ∑uu =σu2I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)

5. Conclusion

ACKNOWLEDGEMENTS

References

Cite this article:

Normal Style

MLA Style

APA Style

Chicago Style

Table 1. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=500. Reported numbers are the averages and standard errors (show in parentheses)

Table 2. Results from the simulation study, when covariance matrix ∑_x has entries ∑_x_,jk = i{j=k}, j,k=1,...,p, and , ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)

Table 3. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I,p=500. Reported numbers are the averages and standard errors (show in parentheses)

Table 4. Results from the simulation study, when covariance matrix ∑_x has entries∑_x_,jk = 0.5^|j-k|, j,k=1,...,p, and ∑_uu=σ_u²I, p=1000. Reported numbers are the averages and standard errors (show in parentheses)