Introducing causal community motifs: A brand new method to figuring out heterogeneous spillover results
This project is a collaborative effort with Yuan Yuan, a PhD student at MIT, and the Facebook Core Data Science team. More information about CDS can be found on the CDS team page.
What the research is:
Randomized experiments or A / B tests remain the gold standard for assessing the causal effect of policy intervention or product change. However, experimental settings, such as social networks, in which users interact and influence one another, can violate traditional assumptions that there is no interference to credible causal inferences. Existing solutions for network setting include taking into account the proportion or number of neighbors treated in a user’s network. However, most current methods do not take into account the local network structure, only the number of neighbors.
Our study offers an approach that takes into account both the local structure in a user’s social network via motives and the treatment assignment conditions of the neighbors. We propose a two-part approach. We first introduce and use causal network motifs. These are network motifs that characterize the assignment conditions in local ego networks. We then propose a tree-based algorithm that can be used to identify various network interference conditions and estimate their average potential outcomes. Our approach can take into account social network theories such as structural diversity and echo chambers, and also help establish network interference conditions that are appropriate for each experiment. We are testing our method on a synthetic network setting and on a real experiment in a large network, which highlights how the consideration of local structures can better take into account different interference patterns in networks.
As an example, FIG. 1 shows four examples of network disturbances that could be detected by the local network structures and the treatment assignment. The first two conditions are simply the cases in which all neighbors are treated or not treated, followed by the important network interference conditions suggested by structural diversity or complex contagion. In the case of structural diversity and echo chamber settings, the ego nodes in (c) and (d) have 1/2 treated neighbors, but have very different local structures, and the result of the ego can be different in these settings. We don’t know which is the dominant factor that determines most of the variance in the outcome.
Figure 1: Examples of network interference conditions in various local network structures. The star indicates a user and a circle represents a user’s friends. Solid circles indicate a friend is under treatment and hollow circles indicate a friend is in control. For stars, the shading indicates that it can be treated or controlled.
Given the large number of degrees of freedom of researchers in existing approaches to network disturbance, e.g. For example, when selecting the threshold for an exposure condition, our approach provides a simple way of automatically specifying exposure conditions. In this way, researchers no longer need to define the exposure conditions a priori, and the exposure conditions generated by the algorithm are appropriate for the given data and experiments. We believe that methodological innovations for addressing network interference problems in A / B testing in networks will continue to be an important area of development, and considering network motives with treatment assignment conditions provides a useful way to detect heterogeneous network interference effects.
How it works:
Our study provides a two-step solution to automatically identify different exposure conditions while overcoming concerns about selection bias, as detailed in the section following Figure 2. First, for an A / B test in a network, we construct a network motif features with treatment assignment conditions (ie causal network motifs) in order to enable a fine-grained characterization of the local network structure and possible interference conditions. Second, using network motif characterization as input, we develop a tree-based algorithm to perform clustering and define the set D, rather than allowing practitioners to study it.
We introduce causal network motifs that differ from traditional network motifs in two main ways. First, we will focus on (1-hop) ego networks containing the ego node, generalizing the methods for 𝑛> 1 to higher 𝑛-hop ego networks. Second, consider the user’s treatment assignment conditions and his 𝑛-hop connections. We use the term “network motives” to refer to conventional motives with no treatment assignment designations (or conditions of assignment) and “causal network motives” to refer to those with conditions of assignment. Examples of network motifs are shown in Figure 2. We use these counts in a 𝑛-hop ego network to characterize the exposure state of each observation.
Figure 2: Examples of causal network motives. Stars represent egos and circles represent change. Solid indicates the treated node, hollow indicates control, and shaded indicates it could be treated or controlled. The first patterns in each row are traditional network motifs with no assignment conditions or are simply referred to as network motifs, followed by corresponding network motifs. Our interference vector is constructed by dividing the number of a causal network motif by the number of the corresponding causal network motif. The labels under each network motif indicate the designation: For example, an open triad in which a neighbor is treated is referred to as 3o-1.
Now that we’ve counted the causal network motifs for each ego-node in our network, our next step is to convert the counts into features that will be used in the next section. Let X𝑖 denote a 𝑚-dimensional random vector, which is called the interference vector. The interference vector has one important requirement: every element of the random vector is intervenable – that is, the random treatment assignment affects the value of each element of the vector. The requirement addresses the issue of selection bias when we estimate the average potential outcomes.
We construct the interference vector as follows. For each observation, for the number for each causal network motive (e.g. 2-1, 2-0, …, 3o-2, 3o-1, …) we normalize it by the number of the corresponding network motifs (e.g. dyads, open triads, closed triads, …). That way, every element of X𝑖 is intervening and the support for every element is in [0, 1]. Note that when looking at a network motif with many nodes, some observations may not have specific network motifs and normalization cannot be performed. In these scenarios we can either exclude this network motif from the interference vector or delete these observations if they have a really small fraction. See Figure 3 for an illustration of the construction of the interference vector.
Figure 3: An example of an ego network with treatment mappings and the corresponding interference vector. Stars represent egos and circles represent change. Solid indicates the treated node, hollow indicates control, and shaded indicates it could be treated or controlled.
Then our approach partitions [0, 1]m + 1 and determines the exposure conditions based on a decision tree regression. Decision trees can be used for clustering  and are typically easy to interpret in the decision-making process . Thus, it is a suitable machine learning algorithm to solve the partitioning problem. Each leaf of the decision tree corresponds to a unique exposure condition (partition). Compared to traditional decision tree regression, we have several revisions to account for honest apportionment, positivity, etc.
Why it matters:
Network disruptions are much more complicated than just being described as an indirect effect. In order to investigate and analyze the heterogeneity of indirect effects in experimental data sets, we offer a two-stage solution. We first propose and use the causal network motifs to characterize the network interference conditions, and then develop a tree-based algorithm for partitioning. Our tree-based algorithm can be interpreted in terms of which exposure conditions are important for the definition of potential outcomes, it treats selection bias and positivity problems and avoids false standard error problems through honest apportionment.
Practitioners who apply our approach can gain important insights. For example, they might understand how to use social contagion to promote products when the number of promotions is limited. Researchers can identify key network interference conditions that are not theorized in certain experimental settings.
Read the full paper:
Causal network motives: Identification of heterogeneous spillover effects in A / B tests
Check out our open source implementation on GitHub.
See our presentation on the Web conference 2021.
 Bing Liu, Yiyuan Xia, and Philip S Yu. 2000. Clustering through decision tree construction. In CIKM. 20-29.
 J. Ross Quinlan. 1986. Induction of decision trees. Do Learn (1986).