Identification of Causal Effect with the Non-Compliance and Its EM Algorithm
1College of Science, China University of Petroleum in Beijing R.P. China
2School of Basic Medical Sciences, Capital Medical University, Beijing, China
Many practical studies in biology, medicine, behavior science and the social sciences seek to establish causal relationship between treatments and outcomes, rather than mere associations. In this paper, we use a graphical model to describe a causal graphical model and study its identification. For an unidentifiable model, we introduce covariates which are always observed into the model so that it becomes identifiable. We then give an identifiable condition of the causal graphical model and prove it mathematically. Finally, we give the algorithm for the identifiable average causal effect of outcomes to the accepted treatment and give an example to illustrate this method and algorithm.
At a glance: Figures
Keywords: the rubin causal model, instrument variables, graphical model, identification, non-compliance, EM algorithm, average causal effect
American Journal of Medical Sciences and Medicine, 2013 1 (4),
Received March 12, 2013; Revised June 07, 2013; Accepted June 15, 2013Copyright © 2014 Science and Education Publishing. All Rights Reserved.
Cite this article:
- Xiaotong, Li, and Li Sichen. "Identification of Causal Effect with the Non-Compliance and Its EM Algorithm." American Journal of Medical Sciences and Medicine 1.4 (2013): 55-61.
- Xiaotong, L. , & Sichen, L. (2013). Identification of Causal Effect with the Non-Compliance and Its EM Algorithm. American Journal of Medical Sciences and Medicine, 1(4), 55-61.
- Xiaotong, Li, and Li Sichen. "Identification of Causal Effect with the Non-Compliance and Its EM Algorithm." American Journal of Medical Sciences and Medicine 1, no. 4 (2013): 55-61.
|Import into BibTeX||Import into EndNote||Import into RefMan||Import into RefWorks|
Many practical studies in biology, medicine, behavior science and the social sciences seek to establish causal relationships between treatments and outcomes, rather than mere associations. The only generally accepted approach for inferring causality requires that the receipted treatments should be randomized. In many cases, however, it is not possible to randomize the receipted treatments. For example, even if an assignment to treatment is random, some units may not comply with the assignment due to some reasons in clinical trials. Because of existence of non-compliance, seeking the causal effect of treatment on outcomes becomes complicated. The standard Intention-to-treat (ITT) analysis focuses on causal effect of assignment treatment, rather than causal effect on actual accepted treatment. In order to study the average causal effect (ACE) on actual accepted treatment, Imbens and Angrist , Angrist, Imbens and Rubin  suggested instrumental variables (IV) method. An IV estimator can be imbedded in Rubin’s causal model, under some reasonable assumptions. An IV estimator gives an average causal effect on compliers, but does not give the ACE of the population. Without these assumptions, an IV estimator is simply the ratio of intention to treat causal estimates and has no explanation on the average causal effect. Imbens and Rubin  used Bayesian inference based on likelihood about causal estimate with non-compliance and study the two assumptions in instrumental variables, which are exclusion restriction and monotonicity. If there were not these two assumptions, the range of the maximum likelihood function would be large and the causal effect would become unidentifiable. Hirano, Imbens, Rubin and Zhou  extended the method in Imbens and Rubin , to allow the existence of pre-treat variables and the use of “weakness-identifiable models”. Weak identification means the appropriate posterior distribution. "Weak" means no unique maximum likelihood estimate. All of the methods only gave the average causal effect for a subpopulation of interest, not for the population. In order to study the average causal effect of the actual accepted treatment, Pearl  gave the upper and lower bounds of the average causal effect for the population. However, the bounds are too large due to the fact that the average causal effects on population with non-compliance is unidentifiable, thus it is not a good evaluation on the average causal effect of the population. Graphical model is widely used to describe the independence between variables. Whittaker , Lauritzen  and Cox and Wermuth . In this paper, we use a graphical model to describe the causal model and study its identification. We introduce covariates that are always observed, into the model such that the model becomes identifiable. We then give the condition of identification and prove it mathematically. Finally we give the maximum likelihood estimate (MLE) of the average causal effect on treatment to outcomes and its EM algorithm for identifiable model. We give an example to illustrate this method and algorithm. In Section 2 some notations and assumptions are introduced. In Section 3 we discuss the identification of parameters in the causal graphical model and its identifiable condition. In Section 4, we give the MLE of ACE of the outcome () on a drug for high blood pressure (), we also give the estimate of ITT and ACE of on D when random assignment is ignorable by an example. By this example, we noticed that if we don’t incorporate the covariate “age” into the model, the model is not identifiable. That is, we would not obtain the average causal effects for population. In Section 5 we give the conclusion.
2. Assumptions and Notations
In a supposed situation, we evaluate the causal effect of a new drug () on a health outcome of population with units. Our goal is to estimate the causal effect of the accepted treatment (taking medicine or not taking medicine) on the outcome Y. We assume that patients either take medicine or do not take medicine, and that taking partial doses of the medicine is not allowed. We also assume that a patient taking medicine or not taking medicine is not controlled by researchers, i.e. the accepted treatment is not at random and non-ignorable, and the assigned treatment Z is at random. Letdenote the binary randomized treatment assignment of patient; if subject is assigned to the treatment group and if subject is assigned to the control group. Let denote the binary variable for patient; if subject actually receives the assigned treatment, and if subject does not receive the assigned treatment. In an ideal situation i.e. patients completely comply with the assigned treatment, the causal effect of the assigned treatment D on the outcome Y is ITT. In reality, however is not equal to z due to some reasons. For instance, a patient may worry about the side effects of medicine to refuse taking the medicine, or may take a wrong medicine, and so on. In this case, patients don’t comply with the assigned treatment, which is so-called non-compliance. Now, letdenote a binary outcome if subject is assigned to and actually receives the treatment, then has missing values. For exampleis observed if, while,, and are missing. The absence of an edge between a pair of nodes means that the corresponding variables in this pair are independent conditional on other variables.
3. Identifiability of Causal Graphical Model and Its Identifiable Condition
First we give the definition of the graphical model. Letdenote an undirected graph, where is the set of nodes denoting variables and E is the set of undirected edges between these nodes. The absence of edges between a pair of nodes means that the corresponding variables in the pair are independent conditional on other variables . Identifiability of graphical models of non-response mechanisms was discussed in[8-9]. Let denote the set of all possible probability distributions with the graphical model G. Similar to Fitzemaurice, Laird and Zahner , we give the definition of identifiability for both parameters of a distribution and a graphical model.
Definition 1. The parameter of a distribution is not statistically identifiable if there exists another parameter such that the distributions of the observed data are the same forand, that is
Definition 2. Graphical model is identifiable if the parameter of any distribution in G is identifiable.
Because incomplete compliance exists, we say that it is a non-ignored non-compliance when the accepted treatment is related to the outcome. See graphical model in Figure 1. Let be binary variables, denote accepted treatment and assigned treatment, respectively. Letdenote the observed covariate,denote the binary outcome when. We simplify denote by .
Graphical model is non-identifiable showed in Figure 1 . In this case we introduce the binary covariate that can always be observed, such that it becomes identifiable when, i.e. is conditionally independent of given . It is showed in model in Figure 1.
Lemma 1. Joint probabilityof graphical model G2 in Figure 1 is identifiable if and only if is not independent of i.e..
Proof: Given，the conditional probability of graphical model can be written as
So we have
Multiplying both sides by , and for，we have：
It is equivalent to：
Since , and can be identified from data, the parameter can be identified by solving the following equation
So the parameters are identified such that = /is identified. We showed the joint probability of the graphical model in Figure 3.1 is identifiable.
Lemma 1 tells us that the graphical model is identifiable, consequently is identifiable.
Next we introduce a binary covariate that is always observed and that is independent of given i.e. . Here the graphical model is identifiable as showed in Figure 2.
Theorem 1. In the graphical model showed in Figure 2, the joint conditional probability is identifiable if is independent of given and,i.e. .
Proof: In Figure 2，the joint conditional probability given is as follows:
The second equation above is due to.
Using Lemma 1, is identifiable if ，so we can obtain is identifiable if .
Using Theorem 1 we have that the graphical model showed in Figure 2 is identifiable. Here is conditionally identifiable. Thus we say that has a causal explanation.
4. Estimating Average Causal Effect of Accepted Treatment D to Outcome Y Using EM Algorithm
In Section 3 we have introduced the covariate which always can be observed, and proved that the graphical model in Figure 2 is identifiable if , and that is conditionally identifiable whenis given. So we can obtain the average causal effect of the accepted treatment on the outcome. Letbe the value of covariate, the value of the assigned treatment , the value of the accepted treatment and the value of the outcome . Let be the binary outcome of patient when the assigned treatment and the accepted treatment , where. For example, means the binary health outcome if patient i complies with the assigned treatment---not taking medicine, i.e. patient i does not take medicine when patient i is assigned to the control group. Similarly denotes the binary health outcome of patient i who does not comply with the assigned control treatment and actually patient i takes the medicine.denotes the health outcome of patient i who doesn’t comply with the assigned treatment and actually does not take the medicine. is the outcome of patient i who complies with the assigned treatment and takes the medicine.4.1. Conditional Marginal Likelihood Functions of Complete Data
Giventhe patient is compliable if. Here is observed. Otherwise the patient is non-compliance if. Thus is missing. Here is not at random (because is ignorable if is missing at random). Now completely data are (z=0 , = 0 , , ) (z=0,= 1, ). Let denote the conditional marginal likelihood function of the completely data. Thus
Given, the patient is not compliable if .Here is observed; the patient is compliance if . Thus is missing. Here completely data are (z=0 , = 1 , , ) (z=0,= 0, ). Let denote the conditional marginal likelihood function of the completely data. We have
Given, the patient is not compliable if. Here is observable; the patient is compliable if . Thus is missing. Here completely data are (z=1 , = 0 , , ) (z=1,= 1, ). We letbe their conditional marginal likelihood function. We have
Given, the patient is not compliable if. Thus is observable; the patient is compliable and is missing if. Thus the complete data are (z=1， = 1 , , ) (z=1,= 0, ). Let denote its conditional marginal likelihood function. Thus we have
We use the condition of in the second equation above.
Step E: Let be the estimated frequency when both and can be observed and. We denote the estimated frequency by when both and can be observed and.
Repeat the above steps until the convergence is achieved. Then we have an estimator of
So we have
Replacing by in the process above, we have . Replacing by , we have . Replacing by and , respectively, in the above process, we have .
Let denote the averaged causal effect of Y on D when . Then
We denote the value of Y bywhen (). Let denote the ACE of outcome Y on actual accepted treatment D. We have the following theorem.
Our goal here is to study the curative effect of a new drug on high blood pressure. We denote the binary random assignment by Z. That is, Z=1 means the patient was assigned to the treatment group—taking the medicine, means the patient was assigned to the control group—not taking the medicine, and let D denote the accepted treatment, i.e. D=1 means the patient actually takes the medicine, and D=0 means the patient actually does not take the medicine. Let denote the patient's diastolic pressure. Y=1 means the patient's blood pressure dropped, Y=0 means the patient's diastolic pressure not dropped. X is an observed covariate which represents age. X has a direct effect on the outcome Y. X=1 implies that the patient is under the age of 45, X=0 means the patient is over the age of 45. denotes that the patient’s diastolic pressure does not drop when the assigned treatment is Z and the actual accepted treatment is D denotes that the patient’s diastolic pressure drop when the assigned treatment is Z and the actual accepted treatment is D. For example, means patient’s diastolic pressure does not drop when patient is assigned to a "control" group, which does not take medicine, and the patient complied the “assigned treatment ”. Similarly, we can understand , , , , , , etc. Table1 shows the observed frequency in a case-control trial, in which 216 out of 436 patients were randomly assigned to treatment group (Z=1), while the remaining 220 were assigned to control group (Z=0). Because of existence of non-compliance, when the patient’s diastolic pressure status for compliers can be observed, the patient’s diastolic pressure status for non-complied patients was missing for the control group (Z=0)(see Table 1(a)), while of non-complier can be observed, of complied patients was missing for the control group(Z=0) (see Table 1 (b)). Similar to the assigned treatment group (Z=1) (see Table 1 (c)(d)).
Where we assume ， X is associated with D and D is not ignorable. Our calculation is followed by the EM algorithm given in Section 4.2. We then obtain estimates of all the probability using the observed data in Table 1 when the covariate X, i.e. age be introduced this model. All the result is as follows:
Thus we have:
Similarly we have:
This result shows that the ACE of Y on D is 0.2880. So the new drug for high blood pressure patients is effective. In general, Y is unidentifiable. But it becomes identifiable when we introduce the covariate X—age, and age is associated with Y and. So we can obtain the estimate of the average causal effect of actual taking medicine on patients’ blood pressure.
We also obtain the upper and lower bounds of the ACE (denoted by α) of the population with non-compliance using the method in Pearl .
Although this method gives upper and lower bounds of α of the population, α is unidentifiable without the condition of.
We also obtain the ITT (Intent-to-treat):
Here it is the causal effect of the outcome Y on the assigned treatment Z. It is smaller than α because of non-compliance, i.e. the patients, who are assigned to the treatment group, do not take the medicine due to some reasons (for example, they worry about the side-effect of the medicine), and the patients, who are assigned to the control group, take the medicine.
When Z is ignorable, we obtained
This result is greater than . It exaggerated the effect of medicine. The reason is the higher the blood pressure of the patient, more likely to take medicine.
This example shows us that if we don’t introduce the observed covariate, age into the model, the ACE of the actually accepted treatment (taking medicine) on the outcome (higher blood pressure) with non-compliance is not identifiable.
In this paper, we described the causal model with non-compliance using the graphical model, and defined its identifiability. By introducing the observed covariate into the model we had a method that changes an unidentifiable causal graphical model into an identifiable model. We also gave the condition that the causal effect with the non-compliance is identifiable and proved it mathematically. The EM algorithm of ACE about the actual accepted treatment to the outcomes with non-compliance was introduced. Finally we applied our method to an example. In this example, we gave the Maximum Likelihood Estimate (MLE) of ACE of the health outcome on a drug of high blood pressure D. We also gave the estimate of ITT and ACE of on when random assignment is ignorable. Note that if we don’t incorporate the observed covariate age, which is associated with Y and , into the model, the average causal effect of health outcome Y on actually accepted treatment D is not identifiable with non-compliance.
|||Angrist, J. D., Imbens,G. W. &. Rubin, D.B, “Identification on Causal Effects using Instrumental Variables,” JASA vol.91, No.34, 444-472, 1996.|
|||Imbens, G. W., & Rubins, D. B., “Bayesian Inference for Causal Effects in Randomized Experiments with Noncompliance,” Annals of Statistics, 25, 305-327,1997a.|
|||Keisuke Hirano, G. W. Imbens, D. B. Rubin, & Xiao Hua Zhou, “ Estimating the Effect of an Influenza Vaccine in an Encouragement Design” Biometrics. 1997.|
|||Balke, A. A. & Pearl, J., “Nonparametric bounds on Causal Effects from Partial Compliance Data,” Echnical Report No.199,Codnitive Systems Lab, UCLA Computer Science, Los Angles, CA,1993.|
|||Whittaker, J., Graphical Models in Applied Multivariate statistics, John Wiley & Sons. 1990.|
|||Lauritzen, S. L., Graphical models, Oxford, Oxford University Press, 1996.|
|||Cox, D.R. & Wermuth, N., Multivariate Dependencies: Models, analysis and interpretation. London: Chapman & Hall, 1996.|
|||Wen Qing Ma, Zhi Geng, & Xiao tong Li., “Identification of Graphical Chain Models for Nonignorable Nonresponse in Longitudinal Studies,” The seventh Japan-China Symposium on Statistics, 2000.|
|||Wen Qing Ma, Zhi Geng, & Xiao tong Li. “Identification of nonresponse mechanisms for two way contingency tables,” Behaviormetrika , Vol.30, No.2, 1-20, 2003.|
|||Fitzmaurice, G.M., Laird, N.M. & Zahner, G.E.P.. “Multivariate logistic models for incomplete binary responses,” Journal of the American Statistical Association, 91, 99-108, 1996.|
|||Holland, P. W. & Rubin, D. B. “Causal inference in Retrospective studies, ”Evaluation Review Vol.12, 203-231, 1988.|
|||Daniel F. Heitjan, “Causal Inference in a linical Trial: A Comparative Example,” Controlled clinical Trials, 20, 309-318, 1999.|
|||Stuart,G. Baker, Willian F. Rosenberger, Rebecca Dersimonian, “Closed-Form Estimates For Missing Counts in Two-way Contingency Tables”, Ststistics in Medicine, Vol. 11, 643-657,1992.|
|||G. F. V. Glonek, “On Identifiability in Models For Incomplete binary data”, Statistics and Probability, Letters 41, 191-197,1999.|
|||R, l, Chambers and A. H. Welsh,, “Log-Linear Models for Survey Dada With Non-ignorable Non-response”, J.R.Statistics,Soc.B,55,No.1,157-170. 1993.|
|||Balke,A. and Pearl,J., “Bounds on treatment effects from studies with imperfect compliance,” Journal of the American Statistical Association, 92 1172-1176,1997.|
|||Rosenbaum, P.R. & Rubin, D.B., “The central role of propensity score in observational studies for Causal effects,” Biometrika, vol.70, 41-45，1983.|
|||Pearl,J., “Causal inference from indirect experiments,” Artifical Intelligence in Medicine,” Vol.7, 516-582. 1995.|
|||Rubin D.B., “Bayesian inference for causal effect: the role of randomization,” Ann. Statist. Vol.6, 34-68, 1978.|
|||Rubin D.B., “Estimating Causal effects of treatments in randomized and non-randomized studies,” J. Educ. Psychol. vol.66, 688-701, 1974.|
|||Pearl,J.,“Causal Diagrams for Empirical research,” Biometrika, Vol. 82, No.4, 669-710,1995.|