Identification of Causal Effect with the Non-Compliance and Its EM Algorithm
1College of Science, China University of Petroleum in Beijing R.P. China
2School of Basic Medical Sciences, Capital Medical University, Beijing, China
3. Identifiability of Causal Graphical Model and Its Identifiable Condition
4. Estimating Average Causal Effect of Accepted Treatment D to Outcome Y Using EM Algorithm
Abstract
Many practical studies in biology, medicine, behavior science and the social sciences seek to establish causal relationship between treatments and outcomes, rather than mere associations. In this paper, we use a graphical model to describe a causal graphical model and study its identification. For an unidentifiable model, we introduce covariates which are always observed into the model so that it becomes identifiable. We then give an identifiable condition of the causal graphical model and prove it mathematically. Finally, we give the algorithm for the identifiable average causal effect of outcomes to the accepted treatment and give an example to illustrate this method and algorithm.
At a glance: Figures
Keywords: the rubin causal model, instrument variables, graphical model, identification, non-compliance, EM algorithm, average causal effect
American Journal of Medical Sciences and Medicine, 2013 1 (4),
pp 55-61.
DOI: 10.12691/ajmsm-1-4-2
Received March 12, 2013; Revised June 07, 2013; Accepted June 15, 2013
Copyright © 2013 Science and Education Publishing. All Rights Reserved.Cite this article:
- Xiaotong, Li, and Li Sichen. "Identification of Causal Effect with the Non-Compliance and Its EM Algorithm." American Journal of Medical Sciences and Medicine 1.4 (2013): 55-61.
- Xiaotong, L. , & Sichen, L. (2013). Identification of Causal Effect with the Non-Compliance and Its EM Algorithm. American Journal of Medical Sciences and Medicine, 1(4), 55-61.
- Xiaotong, Li, and Li Sichen. "Identification of Causal Effect with the Non-Compliance and Its EM Algorithm." American Journal of Medical Sciences and Medicine 1, no. 4 (2013): 55-61.
Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |
1. Introduction
Many practical studies in biology, medicine, behavior science and the social sciences seek to establish causal relationships between treatments and outcomes, rather than mere associations. The only generally accepted approach for inferring causality requires that the receipted treatments should be randomized. In many cases, however, it is not possible to randomize the receipted treatments. For example, even if an assignment to treatment is random, some units may not comply with the assignment due to some reasons in clinical trials. Because of existence of non-compliance, seeking the causal effect of treatment on outcomes becomes complicated. The standard Intention-to-treat (ITT) analysis focuses on causal effect of assignment treatment, rather than causal effect on actual accepted treatment. In order to study the average causal effect (ACE) on actual accepted treatment, Imbens and Angrist [1], Angrist, Imbens and Rubin [2] suggested instrumental variables (IV) method. An IV estimator can be imbedded in Rubin’s causal model, under some reasonable assumptions. An IV estimator gives an average causal effect on compliers, but does not give the ACE of the population. Without these assumptions, an IV estimator is simply the ratio of intention to treat causal estimates and has no explanation on the average causal effect. Imbens and Rubin [2] used Bayesian inference based on likelihood about causal estimate with non-compliance and study the two assumptions in instrumental variables, which are exclusion restriction and monotonicity. If there were not these two assumptions, the range of the maximum likelihood function would be large and the causal effect would become unidentifiable. Hirano, Imbens, Rubin and Zhou [3] extended the method in Imbens and Rubin [2], to allow the existence of pre-treat variables and the use of “weakness-identifiable models”. Weak identification means the appropriate posterior distribution. "Weak" means no unique maximum likelihood estimate. All of the methods only gave the average causal effect for a subpopulation of interest, not for the population. In order to study the average causal effect of the actual accepted treatment, Pearl [4] gave the upper and lower bounds of the average causal effect for the population. However, the bounds are too large due to the fact that the average causal effects on population with non-compliance is unidentifiable, thus it is not a good evaluation on the average causal effect of the population. Graphical model is widely used to describe the independence between variables. Whittaker [5], Lauritzen [6] and Cox and Wermuth [7]. In this paper, we use a graphical model to describe the causal model and study its identification. We introduce covariates that are always observed, into the model such that the model becomes identifiable. We then give the condition of identification and prove it mathematically. Finally we give the maximum likelihood estimate (MLE) of the average causal effect on treatment to outcomes and its EM algorithm for identifiable model. We give an example to illustrate this method and algorithm. In Section 2 some notations and assumptions are introduced. In Section 3 we discuss the identification of parameters in the causal graphical model and its identifiable condition. In Section 4, we give the MLE of ACE of the outcome () on a drug for high blood pressure (
), we also give the estimate of ITT and ACE of
on D when random assignment is ignorable by an example. By this example, we noticed that if we don’t incorporate the covariate “age” into the model, the model is not identifiable. That is, we would not obtain the average causal effects for population. In Section 5 we give the conclusion.
2. Assumptions and Notations
In a supposed situation, we evaluate the causal effect of a new drug () on a health outcome
of population with
units. Our goal is to estimate the causal effect of the accepted treatment
(taking medicine or not taking medicine) on the outcome Y. We assume that patients either take medicine or do not take medicine, and that taking partial doses of the medicine is not allowed. We also assume that a patient taking medicine or not taking medicine is not controlled by researchers, i.e. the accepted treatment is not at random and non-ignorable, and the assigned treatment Z is at random. Let
denote the binary randomized treatment assignment of patient
;
if subject
is assigned to the treatment group and
if subject
is assigned to the control group. Let
denote the binary variable for patient
;
if subject
actually receives the assigned treatment
, and
if subject
does not receive the assigned treatment
. In an ideal situation
i.e. patients completely comply with the assigned treatment, the causal effect of the assigned treatment D on the outcome Y is ITT. In reality, however
is not equal to z due to some reasons. For instance, a patient may worry about the side effects of medicine to refuse taking the medicine, or may take a wrong medicine, and so on. In this case, patients don’t comply with the assigned treatment, which is so-called non-compliance. Now, let
denote a binary outcome if subject
is assigned to
and actually receives the treatment
,
then has missing values. For example
is observed if
, while
,
, and
are missing. The absence of an edge between a pair of nodes means that the corresponding variables in this pair are independent conditional on other variables.
3. Identifiability of Causal Graphical Model and Its Identifiable Condition
First we give the definition of the graphical model. Let
denote an undirected graph, where
is the set of nodes denoting variables and E is the set of undirected edges between these nodes. The absence of edges between a pair of nodes means that the corresponding variables in the pair are independent conditional on other variables [9]. Identifiability of graphical models of non-response mechanisms was discussed in[8-9][8]. Let
denote the set of all possible probability distributions with the graphical model G. Similar to Fitzemaurice, Laird and Zahner [10], we give the definition of identifiability for both parameters of a distribution and a graphical model.
Definition 1. The parameter of a distribution is not statistically identifiable if there exists another parameter
such that the distributions of the observed data are the same for
and
, that is
![]() |
Definition 2. Graphical model is identifiable if the parameter of any distribution in G is identifiable.
Because incomplete compliance exists, we say that it is a non-ignored non-compliance when the accepted treatment is related to the outcome
. See graphical model
in Figure 1. Let
be binary variables,
denote accepted treatment and assigned treatment, respectively. Let
denote the observed covariate,
denote the binary outcome when
. We simplify denote
by
.
Graphical model is non-identifiable showed in Figure 1 [5]. In this case we introduce the binary covariate that can always be observed, such that it becomes identifiable when
, i.e.
is conditionally independent of
given
. It is showed in model
in Figure 1.
Lemma 1. Joint probabilityof graphical model G2 in Figure 1 is identifiable if and only if
is not independent of
i.e.
.
Proof: Given,the conditional probability of graphical model
can be written as
![]() |
So we have
let
That means:
![]() |
Multiplying both sides by , and for
,we have:
![]() |
It is equivalent to:
![]() |
Since , and
can be identified from data, the parameter
can be identified by solving the following equation
![]() |
So the parameters are identified such that
=
/
is identified. We showed the joint probability of the graphical model
in Figure 3.1 is identifiable.
Lemma 1 tells us that the graphical model is identifiable, consequently
is identifiable.
Next we introduce a binary covariate that is always observed and that
is independent of
given
i.e.
. Here the graphical model is identifiable as showed in Figure 2.
Theorem 1. In the graphical model showed in Figure 2, the joint conditional probability is identifiable if
is independent of
given
and
,i.e.
.
Proof: In Figure 2,the joint conditional probability given is as follows:
![]() |
The second equation above is due to.
Using Lemma 1, is identifiable if
,so we can obtain
is identifiable if
.
Using Theorem 1 we have that the graphical model showed in Figure 2 is identifiable. Here is conditionally identifiable. Thus we say that
has a causal explanation.
4. Estimating Average Causal Effect of Accepted Treatment D to Outcome Y Using EM Algorithm
In Section 3 we have introduced the covariate which always can be observed, and proved that the graphical model in Figure 2 is identifiable if
, and that
is conditionally identifiable when
is given. So we can obtain the average causal effect of the accepted treatment
on the outcome
. Let
be the value of covariate
,
the value of the assigned treatment
,
the value of the accepted treatment
and
the value of the outcome
. Let
be the binary outcome of patient
when the assigned treatment
and the accepted treatment
, where
. For example,
means the binary health outcome if patient i complies with the assigned treatment---not taking medicine, i.e. patient i does not take medicine when patient i is assigned to the control group. Similarly
denotes the binary health outcome of patient i who does not comply with the assigned control treatment and actually patient i takes the medicine.
denotes the health outcome of patient i who doesn’t comply with the assigned treatment and actually does not take the medicine.
is the outcome of patient i who complies with the assigned treatment and takes the medicine.
Giventhe patient is compliable if
. Here
is observed. Otherwise the patient is non-compliance if
. Thus
is missing. Here
is not at random (because
is ignorable if
is missing at random). Now completely data are (z
=0 ,
= 0 ,
,
) (z
=0,
= 1,
). Let
denote the conditional marginal likelihood function of the completely data. Thus
![]() |
Given, the patient is not compliable if
.Here
is observed; the patient is compliance if
. Thus
is missing. Here completely data are (z
=0 ,
= 1 ,
,
) (z
=0,
= 0,
). Let
denote the conditional marginal likelihood function of the completely data. We have
![]() |
Given, the patient is not compliable if
. Here
is observable; the patient is compliable if
. Thus
is missing. Here completely data are (z
=1 ,
= 0 ,
,
) (z
=1,
= 1,
). We let
be their conditional marginal likelihood function. We have
![]() |
Given, the patient is not compliable if
. Thus
is observable; the patient is compliable and
is missing if
. Thus the complete data are (z
=1,
= 1 ,
,
) (z
=1,
= 0,
). Let
denote its conditional marginal likelihood function. Thus we have
![]() |
We have
![]() |
![]() |
![]() |
![]() |
We use the condition of in the second equation above.
Step E: Let be the estimated frequency when both
and
can be observed and
. We denote the estimated frequency by
when both
and
can be observed and
.
![]() |
![]() |
Step M:
![]() |
Where
![]() |
![]() |
Repeat the above steps until the convergence is achieved. Then we have an estimator of
So we have
![]() |
Replacing by
in the process above, we have
. Replacing
by
, we have
. Replacing
by
and
, respectively, in the above process, we have
.
Let denote the averaged causal effect of Y on D when
. Then
![]() |
We denote the value of Y bywhen
(
). Let
denote the ACE of outcome Y on actual accepted treatment D. We have the following theorem.
Theorem 2.
Proof: Since
![]() |
We have
![]() |
![]() |
Our goal here is to study the curative effect of a new drug on high blood pressure. We denote the binary random assignment by Z. That is, Z=1 means the patient was assigned to the treatment group—taking the medicine, means the patient was assigned to the control group—not taking the medicine, and let D denote the accepted treatment, i.e. D=1 means the patient actually takes the medicine, and D=0 means the patient actually does not take the medicine. Let
denote the patient's diastolic pressure. Y=1 means the patient's blood pressure dropped, Y=0 means the patient's diastolic pressure not dropped. X is an observed covariate which represents age. X has a direct effect on the outcome Y. X=1 implies that the patient is under the age of 45, X=0 means the patient is over the age of 45.
denotes that the patient’s diastolic pressure does not drop when the assigned treatment is Z and the actual accepted treatment is D
denotes that the patient’s diastolic pressure drop when the assigned treatment is Z and the actual accepted treatment is D. For example,
means patient’s diastolic pressure does not drop when patient is assigned to a "control" group, which does not take medicine, and the patient complied the “assigned treatment ”. Similarly, we can understand
,
,
,
,
,
,
etc. Table1 shows the observed frequency in a case-control trial, in which 216 out of 436 patients were randomly assigned to treatment group (Z=1), while the remaining 220 were assigned to control group (Z=0). Because of existence of non-compliance, when the patient’s diastolic pressure status
for compliers can be observed, the patient’s diastolic pressure status
for non-complied patients was missing for the control group (Z=0)(see Table 1(a)), while
of non-complier can be observed,
of complied patients was missing for the control group(Z=0) (see Table 1 (b)). Similar to the assigned treatment group (Z=1) (see Table 1 (c)(d)).
Where we assume , X is associated with D and D is not ignorable. Our calculation is followed by the EM algorithm given in Section 4.2. We then obtain estimates of all the probability using the observed data in Table 1 when the covariate X, i.e. age be introduced this model. All the result is as follows:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Thus we have:
![]() |
![]() |
Similarly we have:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
This result shows that the ACE of Y on D is 0.2880. So the new drug for high blood pressure patients is effective. In general, Y is unidentifiable. But it becomes identifiable when we introduce the covariate X—age, and age is associated with Y and. So we can obtain the estimate of the average causal effect of actual taking medicine on patients’ blood pressure.
We also obtain the upper and lower bounds of the ACE (denoted by α) of the population with non-compliance using the method in Pearl [4].
![]() |
![]() |
That is,
![]() |
Although this method gives upper and lower bounds of α of the population, α is unidentifiable without the condition of.
We also obtain the ITT (Intent-to-treat):
![]() |
Here it is the causal effect of the outcome Y on the assigned treatment Z. It is smaller than α because of non-compliance, i.e. the patients, who are assigned to the treatment group, do not take the medicine due to some reasons (for example, they worry about the side-effect of the medicine), and the patients, who are assigned to the control group, take the medicine.
When Z is ignorable, we obtained
![]() |
This result is greater than . It exaggerated the effect of medicine. The reason is the higher the blood pressure of the patient, more likely to take medicine.
This example shows us that if we don’t introduce the observed covariate, age into the model, the ACE of the actually accepted treatment (taking medicine) on the outcome (higher blood pressure) with non-compliance is not identifiable.
5. Conclusion
In this paper, we described the causal model with non-compliance using the graphical model, and defined its identifiability. By introducing the observed covariate into the model we had a method that changes an unidentifiable causal graphical model into an identifiable model. We also gave the condition that the causal effect with the non-compliance is identifiable and proved it mathematically. The EM algorithm of ACE about the actual accepted treatment to the outcomes with non-compliance was introduced. Finally we applied our method to an example. In this example, we gave the Maximum Likelihood Estimate (MLE) of ACE of the health outcome on a drug of high blood pressure D. We also gave the estimate of ITT and ACE of
on
when random assignment is ignorable. Note that if we don’t incorporate the observed covariate age, which is associated with Y and
, into the model, the average causal effect of health outcome Y on actually accepted treatment D is not identifiable with non-compliance.
References
[1] | Angrist, J. D., Imbens,G. W. &. Rubin, D.B, “Identification on Causal Effects using Instrumental Variables,” JASA vol.91, No.34, 444-472, 1996. | ||
![]() | |||
[2] | Imbens, G. W., & Rubins, D. B., “Bayesian Inference for Causal Effects in Randomized Experiments with Noncompliance,” Annals of Statistics, 25, 305-327,1997a. | ||
![]() | CrossRef | ||
[3] | Keisuke Hirano, G. W. Imbens, D. B. Rubin, & Xiao Hua Zhou, “ Estimating the Effect of an Influenza Vaccine in an Encouragement Design” Biometrics. 1997. | ||
![]() | |||
[4] | Balke, A. A. & Pearl, J., “Nonparametric bounds on Causal Effects from Partial Compliance Data,” Echnical Report No.199,Codnitive Systems Lab, UCLA Computer Science, Los Angles, CA,1993. | ||
![]() | |||
[5] | Whittaker, J., Graphical Models in Applied Multivariate statistics, John Wiley & Sons. 1990. | ||
![]() | |||
[6] | Lauritzen, S. L., Graphical models, Oxford, Oxford University Press, 1996. | ||
![]() | |||
[7] | Cox, D.R. & Wermuth, N., Multivariate Dependencies: Models, analysis and interpretation. London: Chapman & Hall, 1996. | ||
![]() | |||
[8] | Wen Qing Ma, Zhi Geng, & Xiao tong Li., “Identification of Graphical Chain Models for Nonignorable Nonresponse in Longitudinal Studies,” The seventh Japan-China Symposium on Statistics, 2000. | ||
![]() | |||
[9] | Wen Qing Ma, Zhi Geng, & Xiao tong Li. “Identification of nonresponse mechanisms for two way contingency tables,” Behaviormetrika , Vol.30, No.2, 1-20, 2003. | ||
![]() | |||
[10] | Fitzmaurice, G.M., Laird, N.M. & Zahner, G.E.P.. “Multivariate logistic models for incomplete binary responses,” Journal of the American Statistical Association, 91, 99-108, 1996. | ||
![]() | CrossRef | ||
[11] | Holland, P. W. & Rubin, D. B. “Causal inference in Retrospective studies, ”Evaluation Review Vol.12, 203-231, 1988. | ||
![]() | CrossRef | ||
[12] | Daniel F. Heitjan, “Causal Inference in a linical Trial: A Comparative Example,” Controlled clinical Trials, 20, 309-318, 1999. | ||
![]() | CrossRef | ||
[13] | Stuart,G. Baker, Willian F. Rosenberger, Rebecca Dersimonian, “Closed-Form Estimates For Missing Counts in Two-way Contingency Tables”, Ststistics in Medicine, Vol. 11, 643-657,1992. | ||
![]() | CrossRef | ||
[14] | G. F. V. Glonek, “On Identifiability in Models For Incomplete binary data”, Statistics and Probability, Letters 41, 191-197,1999. | ||
![]() | CrossRef | ||
[15] | R, l, Chambers and A. H. Welsh,, “Log-Linear Models for Survey Dada With Non-ignorable Non-response”, J.R.Statistics,Soc.B,55,No.1,157-170. 1993. | ||
![]() | |||
[16] | Balke,A. and Pearl,J., “Bounds on treatment effects from studies with imperfect compliance,” Journal of the American Statistical Association, 92 1172-1176,1997. | ||
![]() | CrossRef | ||
[17] | Rosenbaum, P.R. & Rubin, D.B., “The central role of propensity score in observational studies for Causal effects,” Biometrika, vol.70, 41-45,1983. | ||
![]() | CrossRef | ||
[18] | Pearl,J., “Causal inference from indirect experiments,” Artifical Intelligence in Medicine,” Vol.7, 516-582. 1995. | ||
![]() | CrossRef | ||
[19] | Rubin D.B., “Bayesian inference for causal effect: the role of randomization,” Ann. Statist. Vol.6, 34-68, 1978. | ||
![]() | CrossRef | ||
[20] | Rubin D.B., “Estimating Causal effects of treatments in randomized and non-randomized studies,” J. Educ. Psychol. vol.66, 688-701, 1974. | ||
![]() | CrossRef | ||
[21] | Pearl,J.,“Causal Diagrams for Empirical research,” Biometrika, Vol. 82, No.4, 669-710,1995. | ||
![]() | CrossRef | ||