Learning Identifiable Factorized Causal Representations of Cellular Responses
Abstract
The study of cells and their responses to genetic or chemical perturbations promises to accelerate the discovery of therapeutics targets. However, designing adequate and insightful models for such data is difficult because the response of a cell to perturbations essentially depends on contextual covariates (e.g., genetic background or type of the cell). There is therefore a need for models that can identify interactions between drugs and contextual covariates. This is crucial for discovering therapeutics targets, as such interactions may reveal drugs that affect certain cell types but not others. We tackle this problem with a novel Factorized Causal Representation (FCR) learning method, an identifiable deep generative model that reveals causal structure in single-cell perturbation data from several cell lines. FCR learns multiple cellular representations that are disentangled, comprised of covariate-specific (Z_x), treatment-specific (Z_t) and interaction-specific (Z_tx) representations. Based on recent advances of non-linear ICA theory, we prove the component-wise identifiability of Z_tx and block-wise identifiability of Z_t and Z_x. Then, we present our implementation of FCR, and empirically demonstrate that FCR outperforms state-of-the-art baselines in various tasks across four single-cell datasets.
Type
Publication
Advances in Neural Information Processing Systems