Supervised Contrastive Block Disentanglement

Jan 1, 2025·

Taro Makino

Ji Won Park

Natasa Tagasovska

Takamasa Kudo

Paula Coelho

Heming Yao

Jan-Christian Huetter

Ana Carolina Leote

Burkhard Hoeckendorf

Stephen Ra

David Richmond

Kyunghyun Cho*

Aviv Regev*

Romain Lopez*

· 0 min read

PDF Cite

Abstract

Real-world datasets often combine data collected under different experimental conditions. Although this yields larger datasets, it also introduces spurious correlations that make it difficult to accurately model the phenomena of interest. We address this by learning two blocks of latent variables to independently represent the phenomena of interest and the spurious correlations. The former are correlated with the target variable y and invariant to the environment variable e, while the latter depend on e. The invariance of the phenomena of interest to e is highly sought-after but difficult to achieve on real-world datasets. Our primary contribution is an algorithm called Supervised Contrastive Block Disentanglement (SCBD) that is highly effective at enforcing this invariance. It is based purely on supervised contrastive learning, and scales to real-world data better than existing approaches. We empirically validate SCBD on two challenging problems. The first is domain generalization, where we achieve strong performance on a synthetic dataset, as well as on Camelyon17-WILDS. SCBD introduces a single hyperparameter that controls the degree of invariance to e. When we increase the hyperparameter to strengthen the degree of invariance, there is a monotonic improvement in out-of-distribution performance at the expense of in-distribution performance. The second is a scientific problem of batch correction. Here, we demonstrate the utility of SCBD by learning representations of single-cell perturbations from 26 million Optical Pooled Screening images that are nearly free of technical artifacts induced by the variation across wells.

Type

Journal article

Publication

arXiv

Last updated on May 13, 2025

← Modeling Complex System Dynamics with Flow Matching Across Time and Conditions Jan 1, 2025

Generative Flow Networks Assisted Biological Sequence Editing Nov 1, 2024 →