Application of Binary Logistic Regression in Assessing Risk Factors Affecting the Prevalence of Toxo...

Mutangi Kudakwashe, Kasim Mohammed Yesuf

  Open Access OPEN ACCESS  Peer Reviewed PEER-REVIEWED

Application of Binary Logistic Regression in Assessing Risk Factors Affecting the Prevalence of Toxoplasmosis

Mutangi Kudakwashe1,, Kasim Mohammed Yesuf2

1Department of Mathematics, Harare Institute of Technology, P O box BE277, Belvedere, Harare, Zimbabwe

2Kasim Mohammed Yesuf, Department of Statistics, University of Gondar, P O Box 196, Gondar, Ethiopia


Toxoplasmosis is a parasitic disease caused by the protozoan parasite Toxoplasma Gondii (T.gondii). The parasite infects warm-blooded animals among them humans especially those whose immunity has been compromised. The transmission mode of the parasite vary from living in unhygienic conditions, contact with cat faeces to contact with raw meat or the practice of raw meat eating, such as commonly practiced in Ethiopia. Binary logistic regression was used to determine the risk factors affecting the prevalence of toxoplasmosis in HIV/AIDS patients. Significant risk factors were detected using the Wald and the likelihood ratio tests. The model selected was then subjected to diagnostic checks to assess its fitness using the Hosmer and Lemeshow test as well as the Pearson, and deviance goodness of fit tests. The results showed that patients living under unhygienic conditions, aged patients, illiterate and less educated patients were mostly affected by toxoplasmosis. There was more prevalence in urban areas than in rural areas possibly due to the high density of people in urban areas.

Cite this article:

  • Kudakwashe, Mutangi, and Kasim Mohammed Yesuf. "Application of Binary Logistic Regression in Assessing Risk Factors Affecting the Prevalence of Toxoplasmosis." American Journal of Applied Mathematics and Statistics 2.6 (2014): 357-363.
  • Kudakwashe, M. , & Yesuf, K. M. (2014). Application of Binary Logistic Regression in Assessing Risk Factors Affecting the Prevalence of Toxoplasmosis. American Journal of Applied Mathematics and Statistics, 2(6), 357-363.
  • Kudakwashe, Mutangi, and Kasim Mohammed Yesuf. "Application of Binary Logistic Regression in Assessing Risk Factors Affecting the Prevalence of Toxoplasmosis." American Journal of Applied Mathematics and Statistics 2, no. 6 (2014): 357-363.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

1. Introduction

T. gondii is a protozoan parasite which is an endemic world-wide organism [16]. It has been estimated that about one third of the human population is infected with this parasite [6]. Toxoplasmosis is arguably the most opportunistic disease in HIV infected persons. There is a wide geographic variation in the prevalence of latent toxoplasma infection [21].

Toxoplasmosis if not properly managed has the capacity to cause morbidity and mortality in immune-compromised patients. This disease generally results from reactivation of latent infection [12]. Although T.gondii can cause disease with multiple sites of infection, one organ may be preferentially infected. About 4% of all cases of pneumonia in AIDS patients can be attributed to pulmonary toxoplasmosis and is the second or third most frequent form of toxoplasmosis after toxoplasmic encephalitis [12]. Toxoplasmosis is normally asymptomatic in healthy individuals but can cause abortion in pregnant patients. The overall risk for maternal-fetal transmission in HIV negative women who acquire primary toxoplasma infection during pregnancy is 29% [10]. Among the congenital infections, it is estimated that 10-13% of the babies will have visual handicaps.

Acute cerebral toxoplasmosis is the most common cause of focal neurologic disorder in AIDS patients. If not detected and treated promptly, cerebral toxoplasmosis may cause significant morbidity and mortality. All HIV-infected persons should be made aware of the non-pharmacologic and medical prophylaxis for T. Gondii infection and sero-positive patients should receive either primary or secondary prophylaxis for toxoplasmosis [11].

The sero-prevalence of anti-toxoplasma IgG antibody among HIV+ and HIV negative participants was determined by the study in Addis Ababa, Ethiopia. The overall prevalence rate of latent T.gondii infection was found to be 90%. The high sero-prevalence of latent toxoplasma infection among the study population seems reasonable as raw or insufficiently cooked meat is consumed in various cultural food outlets. Cats are also abundant and the climatic conditions favour the survival of the parasite [16]. In Ethiopia, the prevalence of IgG antibodies to toxoplasma gondii has been determined by an enzyme-linked immunosorbent assay (ELISA). One thousand and sixteen sera collected in six different geographical regions were analysed. Antibody titres>15 IU/ml were detected in 74.4% of the specimens, titres exceeding 200 IU/ml in one third of the ELISA-positive sera [8].

The association between socio-economic characteristics of a society and the provision of health care has become a matter of interest among several authors. Inequalities have been reported with particular contrasts between rural and urban areas [19]. Levels of urban health are sometimes viewed as being worse than in rural areas, but contrary evidence do also exist [4]. Health is believed to be influenced by ecological, social environment, genetics and individual characteristics. Urban and rural differences in mortality are determined by many factors such as differences in lifestyle, ecological situations and access to health services, unequal distribution of incomes and resources [17]. Rural people are less likely to protect themselves against HIV and if they become ill, they are less likely to receive adequate care. Poverty widespread in rural areas leads to poor nutrition and poor health which makes a person more vulnerable to HIV related death [17].

Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response variable and one or more explanatory variables. Sometimes the outcome variable is discrete with two or more possible values. Binary logistic regression has become the standard method of analysis in situations where the outcome variable is discrete. Early uses of binary logistic regression were in biomedical sciences but the past twenty years has also seen much use in social sciences and marketing [1].

Logistic regression is an increasingly popular statistical technique used to model the probability of discrete (binary or multinomial) outcomes. When properly applied, logistic regression analyses yield very powerful insights into which variables are more or less likely to predict an event outcome in a population of interest. Modelling a binary response variable using normal linear regression introduces substantial bias into parameter estimates.

1.1. Statement of the Problem

In 2008, an estimated 1.7 million deaths (2%) in Ethiopia were due to HIV/AIDS related illness in the 15-49 age groups which is also the productive age group. In 2010 the figure rose to 1.9 million deaths in the same age group [5]. Toxoplasma infection in humans may be acquired by eating undercooked infected meat containing Toxoplasma cysts, ingestion of the oocysts from faecally contaminated hands or food, organ transplantation or blood transfusion, transplacental transmission and accidental inoculation of tachzoites. In Ethiopia, there is a practice of raw meat eating which leaves the population exposed to toxoplasmosis.

[2] focused on the prevention and factors that increase the chances of contracting toxoplasmosis, giving less attention to improving the livelihoods of HIV+ patients on ART. Several other researchers didn’t focus on assessing the factors that influence the survival status of HIV+ persons on ART who might be exposed to toxoplasmosis. Earlier researchers also tended to concentrate on factors affecting toxoplasmosis without taking their time to employ rigorous statistical analysis coupled with adequate model diagnostic checks. A binary logistic regression model was therefore used for the investigation of risk factors that affect toxoplasmosis prevalence in HIV/AIDS patients. This is the motivation for this research among other factors already mentioned.

1.2. Significance of the Study

This study will result in the following benefits to the society (health professionals, academics, researchers, general public)

•  It will give a statistical method useful in determining the risk factors affecting the prevalence of toxoplasmosis.

•  It will provide the risk factors themselves.

•  Promote effective planning and policy useful on prevention, control and intervention strategies of toxoplasmosis.

•  Results of this investigation will be used as a basis for further studies on toxoplasmosis prevalence in populations with the same characteristics as the one involved in this study.

1.3. Objectives of the Study

•  To find out the risk factors that affects the prevalence of toxoplasmosis.

•  To identify the best binary logistic model for the data using the logit, probit (normit) link functions and the complementary log log (cloglog) link function.

•  To assess the model for adequacy using existing diagnostic tools.

2. Materials and Methods

2.1. Data Collection

The data was obtained from Jimma University Research and publications office. The study included 120 subjects from all HIV/AIDS patients who came to Mettu Karl hospital laboratory for CD4 counts and ART monitoring during the study period. Mettu town is the capital of Illu Ababor zone about 600 km south west of Addis Ababa. The data was collected by assessing the patient’s socio-demographic and nutritional conditions. A questionnaire was used. The physician also collected the relevant clinical data of manifested toxoplasmosis before sending the patient for sample collection for the documentation of the clinical aspect of the study. A cross-sectional study with systematic random sampling was used. Direct Agglutination Test was used to detect toxoplasma antibodies in the subject’s serum.

2.2. Variables Included in the Study

Table 1. Independent variables involved in the study

2.2.1. Dependent Variable

The dependent variable was the toxo status of an HIV/AIDS patient (positive=1, negative=0), it is denoted TOXO.

2.2.2. Independent Variables

Sixteen variables were considered, variables assumed to influence the prediction of positive/negative toxo status.

2.3. Methodology

Since our dependent variable is dichotomous with several independent variables, logistic regression is preferred from multiple regression and discriminant analysis. Logistic regression apart from being mathematically flexible and easy to use, it results in a biologically meaningful interpretation [9]. In this section we discuss the logistic regression with the logit, probit and the complementary log log link functions.

2.3.1. Binary Logistic Regression Model

The binary logistic regression is discussed under three link functions (2.3.2, 2.3.3 and 2.3.4) as stated above

2.3.2. Logit Model

The logit model is described as follows:


Where is the risk factor, K is the category of the factor, is the intercept of the logit model, is the estimated coefficient for each risk factor j for the logit model and p(Y=1) is the probability that the ith patient will develop toxoplasmosis.

2.3.3. The Probit (Normit) Model

Using the probit link function, the binary logistic model is described as the inverse of the normal distribution function


2.3.4. Complementary Log Log (cloglog) Model

The cloglog link function results in a model described as follows:

2.4. Model Building Strategies (Variable Selection)

The procedure used was to fit an intercept only model (null model) and then fit a model with more parameters (fitted model). The following hypothesis was then tested for significance using the likelihood ratio test: , where is the maximised value for the null model and is the maximised value for the fitted model.

for all j versus for at least one j.

Rejection of the null hypothesis would mean that at least one of the parameters is different from zero. The Wald test was used to test the statistical significance of each of the parameters once the null hypothesis is rejected.

One problem with any bivariate approach is that it ignores the possibility that a collection of variables, each of which is weakly associated with the outcome, can become an important predictor of the outcome. In using the package R, the logistic regression is a stepwise procedure which begins by selecting the strongest candidate predictor and then testing additional candidate risk factors (predictors) one at a time for inclusion in the model. This procedure offers several methods for stepwise selection of the best risk factors to include in the model based on the Akaike information Criteria (AIC). The Logit, probit and Cloglog link functions were employed.

2.5. Interpretation of Parameters

The model is monotone depending on the sign of β. P(Y=1) is increasing if β is positive and decreasing otherwise as X increases by one unit. For the logit model, the odds are an exponential function of X. Thus

Is the odds of developing toxoplasmosis for the patient. For every unit increase in X the odds increases multiplicably by . The probit and the cloglog models do not provide an estimate of the odds ratio.

2.6. Assessing the Fit of the Model

In assessing the quality of the fitted models, Pearson’s chi-square statistic and the likelihood ratio statistics (G2) which are based on the comparison of the fitted and observed counts were used. If the chi-square is significant, the variable is considered to be significant in the model. The Hosmer-Lemeshow test was also employed. The Likelihood ratio test statistic was also used where is the maximised value for the null model and is the maximised value of the likelihood function for the fitted model.

3. Results

3.1. Summary Statistics
3.1.1. Socio-economic Factors

About 72.5% of patients with no access to tap water tested positive to toxoplasmosis with 27.5% testing negative. 87.5% of farmers tested positive to toxoplasmosis while only 25% of merchants had the same results. Patients living in 1-2 rooms were very much exposed with 69.2% of them testing positive.

3.1.2. Demographic and Health Factors

Out of 120 patients considered in the study, 68 (59.6%) tested positive for toxoplasmosis with 76.5% of males testing positive compared to 52.5% females. The 18-23 year age group was more exposed than other age-groups with 69.2% of them testing positive. Illiterate and semi-literate people showed high exposure with 70.6% and 75% respectively testing positive for toxoplasmosis. Family sizes of 3 and below showed high proportions with 61.1% testing positive. 63.2% of patients with high fever also tested positive for toxoplasmosis. Living under unhygienic conditions exposed patients to toxoplasmosis with 81.2% of patients without a compound latrine testing positive. 62.5% of patients who lived with cats also tested positive.

3.1.3. Risk Behaviour Factors

Eating raw meat had a higher exposure to toxoplasmosis with 67.9% of raw meat eating patients testing positive. Patients who had the habit of washing their hands and fruit before eating had a low risk with 48.8% and 46.9% respectively testing positive.

3.2. Results on Tests of Association between the Independent Variables and the Dependent Variable (TOXO)

Each of the independent variables was tested for association with the dependent variable (toxoplasmosis) taking one variable at a time. The Pearson Chi-square (X2) and the likelihood ratio (G2) tests of association were used. The results are in Table 2 below:

Table 2. Tests of association between toxoplasmosis and each of the independent variables

The results above indicate that the following variables influence toxoplasmosis in a significant way: SEX, OCC, EDUC, ROOMS, TAPWATER, CPDLATRINE, HOUSETOILET, CAT, HAND and the practice of washing fruits.

3.3. Results from Multiple Logistic Regression Analysis
3.3.1. Variables Identified by the Stepwise Selection Using the Logit Link Function

Table 3. Variables identified by the stepwise procedure using the logit link function

3.3.2. Variables identified by Stepwise Selection Using the Probit Link Function

Table 4. Variables identified by stepwise procedure using the Probit link function

Table 5. Variables identified by stepwise selection using the cloglog link function

3.3.3. Parameter Estimates

Table 6. The estimated coefficients for the covariates in the logit and the probit models

All the significant parameters were included in their respective models and then checked for adequacy. The AIC for the logit model was 91.419 while that for the probit model was 90.724.

Table 7. Parameter estimates using the cloglog link function

3.4. Final Selected Model

In selecting the best model, the selection criteria involved choosing the distribution and link function as well as which covariates to include in the model. The model with the smallest AIC was selected as the best model.

Table 8. AIC values for the three models used in this study

3.5. Model Adequacy Checking

A goodness of fit test (G) was used to assess the overall fitness of the selected cloglog model. This is the chi-square difference between the null model (with intercept only) and the model containing one or more predictors. For our selected cloglog model, with df=16 and p=0.000 implying a significant increase in the likelihood thereby implying a good fit of the model. The difference between the deviance of the null (M1) and full (M2) models yields . The deviance for the cloglog model was with (df=97) and with (df=113), giving with (df=113-97=16) as already given. In the Pearson and Deviance tests, the larger the p-value the better fit to the data. The Pearson and deviance test values were respectively 67.78558 and 55.438 with their corresponding p-values as 0.98944 and 0.99977. This indicates no evidence to suggest that data did not come from a population that follows a logistic regression model. There was also no evidence of over-dispersion as the estimated dispersion parameter was 0.6988.

The plot of leverages shows points with high leverages (greater than ). Points 9, 15, 22 etc have high leverages.

3.6. Discussion of Results

This study showed an overall 59.6% seroprevalence of toxoplasmosis for the one hundred and twenty patients at Mettu Karl hospital included in the study. This is lower than the prevalence in Jimma town (83.6%) [20], 81.4% in central Ethiopia [7] but higher than that reported in Nigeria in which 242 HIV positive patients were tested and an overall 41.3% seroprevalence was reported [15].

Women were less affected with 52.5% seroprevalence in sharp contrast with other studies; Seropositivity was higher in females (42.8%) than males (39.2%) in a study in Nigeria [15]. Toxoplasma seropositivity was not associated with age, sex, Art status and CD4 count [14]. In a study involving 330 sera in Addis, toxoplasma infection was higher in males (93.6%) than females (86.6%) supporting the findings in this study [16].

87.5% of the farmers were seropositive possibly due to their constant contact with contaminated soil and failing to wash their hands before food consumption, this is higher than the 73.6% reported by [20]. About 72.5% of the patients with no access to tape water tested positive for toxoplasmosis compared to 75.58% who tested positive at Douala hospital in Cameroon [13].

One of the risk factors was cat presence. 62.5% of those who lived with cats tested positive for toxoplasmosis, a figure quite lower than the 72.6% reported by [13] but higher than reported in [18] in a study involving Thai pregnant women. Other similar results were reported by [7, 15, 20]. The practice of raw meat eating is quite common in Ethiopia. 67.9% seroprevalence for raw meat eaters from this study compares well with 73.3% seroprevalence for meat eaters in the Douala study of Cameroon [13]. There was however no significant association between toxoplasmosis and the habit of raw meat eating in a study by [20].

A cloglog link function was selected as the best predictive model in this study. Although use of logistic regression is mentioned in other studies [7, 20], none of the authors to the best of my knowledge compared the three link functions (probit, logit & cloglog) in order to come up with a suitable model to predict the risk factors for toxoplasmosis.

4. Conclusions

The bivariate logistic regression with a complementary log log (cloglog) link function was the best model for the Toxoplasma data from Mettu Karl hospital in South West Ethiopia. The risk factors which significantly affect Toxoplasma infection of HIV/AIDS patients are age, patient’s place of residence, educational level, marital status, number of family members in the house, number of rooms in the house, tape water, presence or absence of house toilet and the practice of washing hands. The other factors like gender, one’s occupation, cat presence and the practice of raw meat eating were insignificant. Older and less educated patients were mostly affected by toxoplasma gondii. The situation was the same for the married patients and those with no house toilets. The absence of house toilets and tape water means less hygienic conditions at home thus exposing the inhabitants to T. Gondii infection. Recommendations on the prevention of human toxoplasmosis have been suggested by [3] and new research could perhaps focus on finding means to reduce the spread of T.gondii to humans. Prevalence rates of infection of new born babies whose mothers were infected with T.gondii are unknown in Ethiopia to the best of my knowledge and that needs further research.


The authors of this article would like to the thank Jimma University research and publications office for making available the data used in this research. We also thank Harare Institute of Technology for the use of their computers.

Statement of Competing Interests

The authors of this study have no competing interests.

List of Abbreviations


[1]  Agresti, A, Categorical data analysis, Gainesville, Florida, 2002
In article      CrossRef
[2]  Alemtsehai, A. And Eshetu, W, Association of HIV infection with some selected factors and modelling the chance of contracting HIV: The case of Hawassa and its surrounding. Journal of Ethiopian Statistical Association, 15. 27-48. 2006
In article      
[3]  Bresciani, K.D.S, Galvao, A.L.B, de Vsconcellos, A.L, Soares, J.A, de Matos, L.V.S, Pierucci, J.C, Neto, L.S, Rodrigues, T.O, Navarro, I.T, Gomes, J.F and da Costa, A.J, Relevant aspects of human toxoplasmosis. Research Journal of Infectious diseases. 2013. [online]. Available: more details.
In article      
[4]  Buve, A., Carael, M. and Hayes, R.J, Multi-center study on factors determining differences in rate of spread of HIV in sub-saharan Africa: Methods and prevalence of HIV infection. AIDS 15 (4). 5-14. 2001.
In article      CrossRef
[5]  Ethiopia Demographic Health Survey (EDHS). Central statistical agency, Addis Ababa, Ethiopia. September 2006.
In article      
[6]  Fong, M.Y, Wong, K.T, Rohela, M, Tan, L.H, Adeeba, K., Lee, Y.Y, and Lau, Y.L, unusual manifestation of cutaneous toxoplasmosis in an HIV-positive patient. Tropical biomedicine, 27 (3). 447-450. 2010.
In article      
[7]  Gebremedhin, E.Z, Abebe, A.H, Tessema, T.S, Tullu,K.D, Medhin, G, Vitale, M, Di Marco, V, Cox, E and Dorny, P, Sero-epidemiology of toxoplasma gondii infection in women of child bearing age in central Ethiopia. BMC infectious diseases, 13: 101, February 2013. [online]. Available: [Accessed July. 25, 2014].
In article      
[8]  Guebre-Xabier, M, Nurilign, A, Gebre-Hiwot, A, Hailu, A, sissay, Y, Getachew, E, Frommel, D, Sero-epidemiological survey of toxoplasma gondii infection in Ethiopia. Ethiopian Medical Journal, 31 (3). 201-208. 1993.
In article      
[9]  Hosmer, D.W, and Lemeshow, S, Applied logistic regression 2nd edition, John wiley & Sons Inc, New York, 2000.
In article      CrossRef
[10]  Jara, M, Hsu, H.W, Eaton, R.B, Demaria, A.Jr, Epidemiology of congenital toxoplasmosis identified by population based newborn screening in Massachusetts. Paediatr infect Disease Journal, 20 (12). 1132-5. 2001.
In article      
[11]  Jayawardena, S, Singh, S, Burzyantseya, O and Clarke, H, Cerebral toxoplasmosis in adult patients with HIV infection. Hospital Physician, 44 (7). 17-24. 2008.
In article      
[12]  Luft,B.J,and Remmington, J.S, AIDS commentary. Toxoplasmosis encephalitis. Journal of infectious Diseases, 157 (1). 1-6.1988.
In article      
[13]  Njunda, A.L, Assob, J.C.N, Nsagha, D.S, Kamga, H.L, nde, P.F, and Yugah, V.C, Seroprevalence of toxoplasma gondii infection among pregnant women in Cameroon. Journal of public health in Africa, 2 (e24). 98-101. 2011.
In article      
[14]  Ogoina, D, Onyemelukwe, G.C, Musa, B.O, and Obiako, R.O, Seroprevalence of IgG and IgM antibodies to toxoplasma infection in healthy and HIV positive adults from Northern Nigeria. Journal Infect dev ctries, 7 (5). 398-403. 2013.
In article      
[15]  Okwuzu, J.O, Odunukwe, N.N, Ezechi, O.C, Gbajabiamila, T.A, Musa, A.Z, Ezeobi, P.M, Oke,B.A, Somefun,T, Okoye, R, Onyeitu, C.C, Adetunji, M, and Otubanjo, A.O, Toxoplasma gondii infection in HIV/AIDS: prevalence and risk factors. African Journal of clinical and Experimental microbiology, 15 (2). 97-102. 2014.
In article      
[16]  Techalew, S, Mekashaw, T, Endale, T, Belete, T and Ashenafi, T, Seroprevalence of latent Toxoplasma gondii infection among HIV-infected and HIV-uninfected people in Addis Ababa, Ethiopia: A comparative cross sectional study. BMC Research Notes, 2: 13. 2009. [online] Available: [accessed July. 31, 2014].
In article      
[17]  Verheij, R.A, Explaining urban-rural variations in health: A review of interactions between individual and environment. Social Science and Medicine, 42 (6). 923-35. 1996.
In article      CrossRef
[18]  Wanachiwanawin, D, Sutthent, R, Chokephaibulkit, K, Mahakittikun, V, Ongrotchanakun, J and Monkong, N, Toxoplasma gondii antibodies in HIV and non-HIV infected Thai pregnant women. Asia and Pacific Journal of allergy and immunology, 19 (4). 291-293. 2001
In article      
[19]  Williams, C.C, and Windebank, J, helping people to help themselves: policy lessons from a study of deprived urban neighbourhoods in Southampton. Journal of Social Science Policy, 29 (3). 355-373. 2000.
In article      CrossRef
[20]  Zemene, E, Yewhalaw, D, Abera, S, Belay, T, Samuel, A, and Zeynudin, A, Seroprevalence of toxoplasma gondii and associated risk factors among pregnant women in Jimma town, South Western Ethiopia. BMC infectious diseases, 12: 337. December 2012. [Online]. Available: [Accessed July. 25, 2014].
In article      
[21]  Zuffrey, J, Sugar, A, Rudaz, P, Bille, J, Glauser, M.P, and Chave, J.P, Prevalence of latent toxoplasma and serologic diagnosis of acute infection in HIV-positive patients. European Journal of Clinical Microbiology and Infectious Diseases, 12 (8). 591-5. 1993.
In article      CrossRef
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn