Supervised Contrastive Block Disentanglement

Jan 1, 2025·
Taro Makino
,
Ji Won Park
,
Natasa Tagasovska
,
Takamasa Kudo
,
Paula Coelho
,
Heming Yao
,
Jan-Christian Huetter
,
Ana Carolina Leote
,
Burkhard Hoeckendorf
,
Stephen Ra
,
David Richmond
,
Kyunghyun Cho*
,
Aviv Regev*
,
Romain Lopez*
· 0 min read
Abstract
Real-world datasets often combine data collected under different experimental conditions. Although this yields larger datasets, it also introduces spurious correlations that make it difficult to accurately model the phenomena of interest. We address this by learning two blocks of latent variables to independently represent the phenomena of interest and the spurious correlations. The former are correlated with the target variable y and invariant to the environment variable e, while the latter depend on e. The invariance of the phenomena of interest to e is highly sought-after but difficult to achieve on real-world datasets. Our primary contribution is an algorithm called Supervised Contrastive Block Disentanglement (SCBD) that is highly effective at enforcing this invariance. It is based purely on supervised contrastive learning, and scales to real-world data better than existing approaches. We empirically validate SCBD on two challenging problems. The first is domain generalization, where we achieve strong performance on a synthetic dataset, as well as on Camelyon17-WILDS. SCBD introduces a single hyperparameter that controls the degree of invariance to e. When we increase the hyperparameter to strengthen the degree of invariance, there is a monotonic improvement in out-of-distribution performance at the expense of in-distribution performance. The second is a scientific problem of batch correction. Here, we demonstrate the utility of SCBD by learning representations of single-cell perturbations from 26 million Optical Pooled Screening images that are nearly free of technical artifacts induced by the variation across wells.
Type
Publication
arXiv