Learning Identifiable Factorized Causal Representations of Cellular Responses

Nov 1, 2024·

Haiyi Mao

Romain Lopez

Kai Liu

Jan-Christian Huetter

David Richmond

Panayiotis v. Benos

Lin Qiu

· 0 min read

PDF Code Cite

Abstract

The study of cells and their responses to genetic or chemical perturbations promises to accelerate the discovery of therapeutics targets. However, designing adequate and insightful models for such data is difficult because the response of a cell to perturbations essentially depends on contextual covariates (e.g., genetic background or type of the cell). There is therefore a need for models that can identify interactions between drugs and contextual covariates. This is crucial for discovering therapeutics targets, as such interactions may reveal drugs that affect certain cell types but not others. We tackle this problem with a novel Factorized Causal Representation (FCR) learning method, an identifiable deep generative model that reveals causal structure in single-cell perturbation data from several cell lines. FCR learns multiple cellular representations that are disentangled, comprised of covariate-specific (Z_x), treatment-specific (Z_t) and interaction-specific (Z_tx) representations. Based on recent advances of non-linear ICA theory, we prove the component-wise identifiability of Z_tx and block-wise identifiability of Z_t and Z_x. Then, we present our implementation of FCR, and empirically demonstrate that FCR outperforms state-of-the-art baselines in various tasks across four single-cell datasets.

Type

Journal article

Publication

Advances in Neural Information Processing Systems

Last updated on May 13, 2025

← Generative Flow Networks Assisted Biological Sequence Editing Nov 1, 2024

Degron-modified Cas12a enhances single-cell CRISPR screening Sep 1, 2024 →