Performance of Log-Beta Log-Logistic Regression Model

Mahmoud Riad Mahmoud, Naglaa A. Morad, Moshera A. M. Ahmad

American Journal of Applied Mathematics and Statistics

Performance of Log-Beta Log-Logistic Regression Model

Mahmoud Riad Mahmoud1, Naglaa A. Morad2, Moshera A. M. Ahmad2,

1Department of Mathematical Statistics, Institute of Statistical Studies and Research, Cairo University

2Department of Applied Statistics and Econometrics, Institute of Statistical Studies and Research, Cairo University

Abstract

For the log-beta log-logistic regression model, we derive the appropriate matrices for assessing the local influence on the parameter estimates under perturbation scheme. Using a set of real data, global and local influences of individual observations on the stated model are considered. Besides, for different parameter settings, sample sizes, and censoring percentages, various simulation studies are performed to the performance of the log-beta log-logistic regression model. In addition, the empirical distribution of the martingale residuals is displayed against the normal distribution for comparison. These studies suggest that the martingale residual has shaped normal form.

Cite this article:

  • Mahmoud Riad Mahmoud, Naglaa A. Morad, Moshera A. M. Ahmad. Performance of Log-Beta Log-Logistic Regression Model. American Journal of Applied Mathematics and Statistics. Vol. 4, No. 3, 2016, pp 74-86. http://pubs.sciepub.com/ajams/4/3/3
  • Mahmoud, Mahmoud Riad, Naglaa A. Morad, and Moshera A. M. Ahmad. "Performance of Log-Beta Log-Logistic Regression Model." American Journal of Applied Mathematics and Statistics 4.3 (2016): 74-86.
  • Mahmoud, M. R. , Morad, N. A. , & Ahmad, M. A. M. (2016). Performance of Log-Beta Log-Logistic Regression Model. American Journal of Applied Mathematics and Statistics, 4(3), 74-86.
  • Mahmoud, Mahmoud Riad, Naglaa A. Morad, and Moshera A. M. Ahmad. "Performance of Log-Beta Log-Logistic Regression Model." American Journal of Applied Mathematics and Statistics 4, no. 3 (2016): 74-86.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

At a glance: Figures

1. Introduction

Frequently in parametric regression models in survival analysis, there are covariates whose values are related to the lifetime under study. The main objective in such cases is, usually, to estimate the relationship between the lifetime and the explanatory variables and to test its significance. For example; the length of time it takes an employee to retire from a given job may be affected by variables such as the employee’s age, experience, education,...etc. Another example of this situation has been discussed by [10], in which survival times for 65 multiple myeloma patients were recorded and related to a number of factors, such as the level of hemoglobin in the blood, the white blood cells count at diagnosis, sex, and age. The main problem in this type of studies is to identify which concomitant variables were strongly related to survival time. To examine the relationship between the lifetime and the concomitant variables we will use a regression model in which lifetime has a distribution that depends on concomitant variables. This involves specifying a model for the distribution of , given x (which may have some censored observation), where represents lifetime and x is a vector of regressor variables for an individual.

In the last decade, a new class of models has been proposed for use with this type of data, that the covariates whose values are related to the lifetime. [17] proposed a location-scale regression model based on the logarithm of an extended Weibull distribution which has the ability to deal with bathtub-shaped failure rate functions. [9] showed that the log-exponentiated Weibull regression model for interval-censored data represented a parametric family of models that include other regression models that are broadly used in lifetime data analysis. [13] proposed a log-β-Birnbaum–Saunders regression model that can be applied to censored data and be used more effectively in survival analysis. However, there are few practical regression models of this type of failure rate function. A log-beta log-logistic regression model (which we wish has a wide use in the lifetime) is proposed. After modeling, it is important to check assumptions in the model and to conduct a robustness study to detect influential or extreme observations that can distort the results of the analysis. So this paper performs a log-beta log-logistic regression model by various methods.

The paper is organized as follows. In sec. 2 we display the review of regression failure models. Background of the log-beta log-logistic regression model in sec.3. Sensitivity analysis in sec 4. Curvature calculations for log-beta log-logistic regression model are introduced in sec 5. We also discuss some simulations studies and a real data set is analyzed in sec. 6.

2. Review

[7] gave an early discussion of parametric regression models in survival analysis where there are covariates whose values are related to the lifetime; they present a method of estimating survival distribution when the survival times are assumed to follow simple exponential distributions, with a different parameter for each patient. The parameter associated with each patient’s distribution is functionally related to the concomitant variates. [18] generalized the work of [7]. They extended the statistical model to permit maximum likelihood (ML) estimation of the parameters of the linear regression where not all patients in a follow-up study have died by the end of the study. [10, 11, 15] have discussed this approach and introduced exponential, Weibull, and Gamma regression model in two cases, complete and censored data. Recently, this approach was discussed by many authors, see for example, [1, 5, 8, 12, 14, 16]. They introduced exponential Weibull, log-Burr XII, log-modified Weibull, the log-generalized inverse Weibull, the log-beta exponentiated Weibull, and log-beta log-logistic regression model.

3. Background of the Log-beta Log-logistic (LBLLogistic) Regression Model

The beta log-logistic distribution, with positive parameters a, b, α and δ, BLLog(a; b; α; δ), considers that lifetime T has a density function given by

(1)

Where a, b, α and δ > 0 is a shape parameter, and α > 0 is a scale parameter. The survival function corresponding to random variable T with BLLog density is given by

and the associated hazard rate function takes the form

(2)

Recently, [12] suggested a regression model based on the BLLog distribution described in (1). This model is so-called the LBLLogistic regression model. The LBLLogistic regression model is represented by

(3)

where yi is the response variable, = (xi1, xi2, . . ., xip) is the vector of explanatory variable, β = (β1, . . ., βp)', σ > 0 . Note that Y = log(T ) follows the BLLog distribution. The density function of Y can be written as

(4)

where a, b, σ > 0, −∞ < μ < ∞, and -∞ < y < ∞.

Further, after suitable transformation, we define the standard random variable Z = (Yµ)/σ with density function

(5)

The survival function takes the form

(6)

and the associated hazard rate function takes the form

(7)

in terms of t, model (3) is referred to as a log-location scale model.

Consider a sample (y1,x1),…,(yn,xn) of n independent observations, where each random response is defined by yi= min{log(ti), log(ci)}. We assume noninformative censoring such that the observed lifetimes and censoring times are independent. Let and be the sets of indices of individuals for which yi is the log lifetime and log-censoring, respectively. The log-likelihood function for the vector of parameters θ=(a,b,δ,βT)T from model (3) has the form

where f(yi) is the density function (4) and S(yi) is the survival function (6) of Yi. The log-likelihood function for θ reduces to

(8)

where r is the number of uncensored observations (failures) and zi =( yi - xTi β)/ σ. The MLE of θ can be obtained by maximizing the log-likelihood function (8).

Let is the observed information matrix and the asymptotic covariance matrix of can be approximated by the inverse of the (p+3)(p+3) observed information matrix ,

where

and

4. Sensitivity Analysis

After fitting a model, it is important to check the assumptions of the model to detect possible extreme or influential observations. So we will discuss the influence diagnostic based on case deletion, in which the influence of the ith observation on the parameter estimates is evaluated by removing it from the analysis. We will display that by the following measures.

4.1. Generalized Cook Distance

[2] has proposed measuring the "distance" between the and the corresponding by calculating the F statistic for the "hypothesis" that . This statistic is recalculated for each observation i=1,…,n. The resulting values should not literally be interpreted as F tests. The suggested measure of the distance of from is

(9)
4.2. Likelihood Displacement

Measures of the influence of the ith observation on the ML estimate , can be based on the sample influence curve , where denotes the ML estimate of computed without the ith observation. While this idea is straightforward, it may be computationally expensive to implement since n+1 ML estimates are needed, each of which may require several iterations. [4] derived the general measure from the use of contours of the log likelihood function to order observations based on influence. If be the log likelihood based on the complete data, a likelihood distance defined as

(10)

Where is the dimension of .

4.3. Local Influence Approach

Removing observations from the analysis suddenly lead to all information on a single data point is deleted suddenly, and therefore, it is difficult to determine whether that data point has some influence on a specific aspect of the model. We can find a solution for this problem by using local influence approach in which one can investigate how the results of an analysis change under small perturbations in the model.

The basic idea in influence analysis -as presented by [4]- is “ to introduce small perturbations in the problem formulation, and then monitor how the perturbations change the outcome of the analysis. The important questions in designing methods for influence analysis are the choices of the perturbation scheme, the particular aspect of an analysis to monitor, and the method of measurement”.

The numerous influence diagnostics that depend on case-deletion can be regarded as global measures since they are designed to measure total change at various corners of , where n is the sample size.

[3] indicated that; single case-deletion diagnostics can be computationally intensive and suffer from a form of masking, and group deletion methods are not easily implemented or well understood. So [3] developed a methodology that is relatively easy to use for the identification of groups of observations that may require special attention. It summarized as follows:

Let be a p×1 vector of unknown parameters, and be the ML estimate of obtained by maximizing , where represents the log-likelihood for a postulated model and observed data (unperturbed log-likelihood). Introduce some perturbations into the model via the m×1 vector , where represents the set of relevant perturbations, and let denotes the log likelihood corresponding to the perturbed model, and let denote ML estimate under . Let d be a fixed nonzero direction of unit length in Rq. Let is (p+3)×n matrix that depends on the perturbation scheme and whose ijth element

The normal curvature for at the direction d for the postulated model is given by

(11)

The extreme Cmax=max Cd and Cmin=min Cd are two possible option. max Cd is the largest eigenvalue of the matrix , and is the corresponding eigenvector. The index plot of for matrix may indicate how to perturb the postulated model to obtain the greatest local change in the likelihood displacement; if the ith element of is found to be relatively large, this indicates that perturbations in the weight of the ith observation may lead to substantial changes in the results of the analysis and thus that is relatively influential. It is important to investigate the ith observation to find the specific observation of the sensitivity.

5. Curvature Calculations for Log-beta Log-logistic Regression Model

As we mentioned, [3] proposed a general framework to detect the influence of observations to evaluate how sensitive the analysis is to small perturbations that are agitated within the model. Some authors have investigated the evaluation of local influences in survival analysis models: for example [6] adapted the method of local influence to regression analysis with censoring; [9] studied the problem of evaluating local influences in the log- exponentiated Weibull regression model with censored data. We introduce a similar methodology to detect influential data points in the LBLLogistic regression model for interval-censored data.

Next, the perturbation scheme will be calculate, the matrix

considering the model defined in (3) and its log-likelihood function given by (8). Let the vector of weights ω=(ω1, ω2,…, ωn). Case-weight perturbation for the log-likelihood function takes the form

where 0≤ωi≤1 and ω0 =(1,…,1). let us denote =(,…, p+3) . and be the sets of indices of individuals for which yi is the log lifetime and log-censoring respectively. Then, the elements of vector take the form

where

On the other hand, the elements of the vector can be shown to be given by

where

The elements of the vector can be shown to be given by

The elements of the vector , for j=4,…,p+3, may be expressed as

6. Application

An application of the result will be provided by using simulated and real data. The required numerical evaluations were applied using Mathcad and Mathematica programs.

6.1. Simulations Study (1)

In order to assess the performance of estimating the parameters of the LBLLogistic regression model, various simulation studies are performed for different settings of sample sizes, censoring percentages, and parameter values. The lifetimes denoted by T1,...,Tn were generated from the BLLog distribution given in (1). The following transformation was made , and where and . The values and were chosen for this study. The fixed components were generated using from gamma distribution with parameters (0.25, 35). The stochastic components representing the errors in the model was generated from (5) for different values of a and b. The censoring times denoted by C1,...,Cn were generated from a uniform distribution(0, ), where was chosen to achieve censoring percentages of 0 or 0.10 or 0.30. The lifetimes considered in each fit were calculated as . We generated the model for different values of n= 20, 50 and 100, different values of =0.8, 1.8, and 5, different values of a= 0.5, 1.08, 1.8, 3, 5, 8, and 10, and different values of b=0.5, 0.8, 1, 1.1, 1.8, 2, 4, and 5. For each sample a LBLLogistic regression model is fitted by estimating the corresponding parameter value using maximum likelihood method. From these one thousand, parameter estimates, bias, the mean square error (MSE), and relative root MSE were calculated. The results are summrize in Table 1.

It may be noticed that:-

- we have small bias and MSE when a> b.

- we have very big bias and MSE when a< b.

- The bias and MSE decreased when n increased.

-The bias and MSE increased when the censoring percentage increased.

6.2. Simulations Study (2)

In order to investigate the form of the empirical distribution of the martingale residuals for different settings of n and censoring percentages, several simulation studies are performed for which the results are displayed graphically in Figure 1 – Figure 6. We assumed sample sizes 30, 50, and 100. The log-lifetime denoted by log(T1),...,log(Tn) were generated from the LBLLogistic regression model given in (3), for a=10, b=1, different value of σ=0.8, 1.8 and 5, =0.3 and =-0.23 with generated from a gamma distribution with parameters (35, 0.25). The censoring times denoted by C1,...,Cn were generated from uniform distribution(0,), where was adjusted until the censoing percentages 0, 10, and 30% be reached. The lifetimes considered in each fit were calculated as . For each setting of n. we generate 1000 sample under the LBLLogistic regression model (3).

For each fit, the martingale residuals were calculated and stored. Then, the residuals were estimated and plotted in probability plots.

Table 1. The estimates, bias, MSE, and relative root MSE for log-beta log-logistic regression model

Figure 1. Normal probability plots for martingale residuals (rMi) . Sample sizes n=30, 50 and 100, percentage of right-censored= 0, 10 and 30, σ=0.8
Figure 2. Histogram and Smooth curve plots for martingale residuals(rMi) .Sample sizes n=30, 50 and 100, percentage of right-censored= 0, 10 and 30, σ=0.8
Figure 3. Normal probability plots for martingale residual (rMi). Sample sizes n=30, 50 and 100, percentage of right-censored=10, 30 and 50, σ=1.8
Figure 4. Histogram and Smooth curve plots for martingale residuals (rMi) . Sample sizes n=30, 50 and 100, percentage of right-censored=10, 30 and 50, σ=1.8
Figure 5. Normal probability plots for martingale residuals (rMi) . Sample sizes n=30, 50 and 100, percentage of right-censored=10, 30 and 50, σ=5
Figure 6. Histogram and Smooth curve plots for martingale residuals (rMi) . Sample sizes n=30, 50 and 100, percentage of right-censored=10, 30 and 50, σ=5

From Figure 1 – Figure 6 we can extract the following interpretations;

- We can observe that the empirical distribution of the martingale residuals presents agreement with the normal distribution.

- As the sample size increased, the empirical distribution of the martingale residuals seems to present the best agreement with the normal distribution.

- As the 1/σ increasing, the empirical distribution of the martingale residuals seems to present the best agreement with the normal distribution.

6.3. Application: Myeloma Data

[12] used the data set given in [10] for the fit LBLLogistic regression model. The aim of the recent study is to study the performance of the LBLLogistic regression model. We use Mathcad and Mathematica to compute Case-deletion measures and defined in (9) and (10). The results of such influence measure index plots are displayed in Figure 7 and Figure 8. These plots show that the cases 29, 40, 44, and 48 are possible influential observations.

Figure 8. The index plot of LDi(θ) on the myeloma data

Impact of the detected influential observations

The diagnostic analysis and detected the four influential observations (cases 29, 40, 44, and 48). The observation 29 corresponds to that one of the largest blood urea nitrogen measurement and age .The observation 40 corresponds to that one of the largest blood urea nitrogen measurement, age, and serum calcium measurement at diagnosis. The observation 44 corresponds to that one of the lowest hemoglobin measurement and serum calcium measurement at diagnosis. The observation 48 corresponds to that one of the largest hemoglobin measurement at diagnosis. In order to reveal the impact of these four observations on the parameter estimates, the model is refitted under some situations. First, each one of these four observations is individually eliminated. Second, we remove from the set “A” (original data set) the totality of potentially influential observations. Table 2 provides the relative change of each estimate (after the “set I ” of observations being removed), and the corresponding p-value. Table 2 provides the following sets: I1={#29}, I2={#40}, I3={#44}, I4={#48}, I5={#29, #40}, I6={#29, #44}, I7={#29, #48}, I8={#29, #40, #44}, I9={#29, #40, #48}, I10={ #40, #44, #48}, I11={#29, #40, #44, #48}.

Table 2. Estimates and their p-value and Relative changes, for the corresponding set

The figures in Table 2 indicate that the estimates of the LBLLogistic regression model are not highly sensitive under deletion of the outstanding observations except for where it became nonsignificant. In general, the significance of the parameter estimates does not change after removing the sets. Hence, we do not have inferential changes after removing the observations handed out in the diagnostic plots.

7. Concluding Remarks

An appropriate matrix for assessing local influence is obtained. We have displayed various simulation studies to assess the performance of estimating the parameters of the LBLLogistic regression model and we noticed that; the bias, MSE, and relative root MSE decreased when a> b and when the sample size increased, and the bias, MSE and relative root MSE increased when a< b and when the sample size decreased. We also noticed that when the censoring percentages increase, the bias, MSE, and relative root MSE increase. Also, various simulation studies are performed to investigate the form of the empirical distribution of the martingale residual and we noticed that the martingale residual has shaped normal form. Finally, the authors have analyzed a data set as an application of influence diagnostics in the LBLLogistic regression model, although the diagnostic plots detected some possible influential observations, their deletion did not cause substantial changes in the results.

References

[1]  Carrasco, J. M. F., Ortega, E.M.M., and Paula, G. A. Log-modified Weibull regression models with censored data: sensitivity and residual analysis. Computational statistics and data analysis. 52(2008). 4021-4039.
In article      View Article
 
[2]  Cook, R.D., (1977) Detection of influential observation in linear regression, Technometrics, Vol., 17, No., 1, 15-18.
In article      
 
[3]  Cook, R. D. Assessment of local influence. Journal of the Royal Statistical Society. Series B (Methodological) . 48. 2. (1986). 133-169.
In article      
 
[4]  Cook, R. D. and Weisberg, S. Residuals and Influence in Regression. Chapman and Hall. London. (1982).
In article      
 
[5]  Cordeiro, G. M., Gomes, A. E., Silva, G. O., and Ortega, E. M. M. The beta exponentiated Weibull distribution. Journal of statistical computation and simulation. 83. 1. (2013). 1141-138.
In article      View Article
 
[6]  Escobar, L. A. and Meeker, W. Q. Assessing influence in regression analysis with censored data. Biometrics. 48(1992). 507-528.
In article      View Article  PubMed
 
[7]  Feigl, P., and Zelen, M. Estimation of exponential survival probabilities with concomitant information. Biometrics. 21. 4. (1965). 826-838.
In article      View Article  PubMed
 
[8]  Gusmao, F. R. S., Ortega, E.M.M., and Cordeiro, G. M. The generalized inverse Weibull distribution. Stat Papers. 52(2011). 591-619.
In article      View Article
 
[9]  Hashimotoa, E.M., Ortega, E.M.M., Canchoc, V.G., and Cordeiro, G.M. The log-exponentiated Weibull regression model for interval-censored data. Computational Statistics and Data Analysis. 54(2010). 1017–1035.
In article      View Article
 
[10]  Krall, J., Uthoff, V. and Harley, J. A step-up procedure for selecting variables associated with survival. Biometrics. 31(1975). 49-57.
In article      View Article  PubMed
 
[11]  Lawless, J. F. Statistical Models and Methods for Lifetime Data. John Wiley & sons. New York. (1982).
In article      
 
[12]  Mahmoud, M. R., EL- Sheikh, A. A., Morad, N. A., and Ahmad, M. A. M. Log-beta log-logistic regression model. International Journal of Sciences: Basic and Applied Research. 22. 2 (2015). 389-405.
In article      
 
[13]  Ortega, E. M. M., Cordeiro, G. M., and Lemonte, A. J. A log-linear regression model for the β-Birnbaum–Saunders distribution with censored data. Computational Statistics and Data Analysis. 56(2012). 698-718.
In article      View Article
 
[14]  Ortega, E.M.M., Cancho, V. G. and Bolfarine, H. Influence diagnostics in exponentiated-Weibull regression models with censored data. SORT. 30. 2. (2006). 171-192.
In article      
 
[15]  Prentice, R. L. Exponential survival with censoring and explanatory variables. Biometrika. 60 (1973). 279-288.
In article      View Article
 
[16]  Silva, G. O., Ortega, E.M.M, Cancho, V. G and, Barreto, M.L. Log-Burr XII regression models with censored data. Computational statistics and data analysis. 52 (2008). 3820-3842.
In article      View Article
 
[17]  Silva, G.O., Ortega, E.M.M., Cordeiro, G.M. A log-extended Weibull regression model. Computational Statisticsand Data Analysis. 53(2009). 4482-4489.
In article      View Article
 
[18]  Zippin, C., and Armitage, P. Use of concomitant variables and incomplete survival information in the estimation of an exponential survival parameter. Biometrics. (1966). 665-672.
In article      View Article
 
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn