Practitioners often face the challenge of deploying prediction models in new environments with shifted distributions of covariates and responses. In observational data, distribution shifts are often caused by unobserved confounding factors, which can obfuscate the definition of the best prediction model and shift covariates to unseen domains. To address this issue, we study the domain adaptation problem with observational data, postulating a linear structural causal model to account for endogeneity and unobserved confounding. We leverage exogenous, invariant covariate representations to cure concept shifts and improve target prediction. We propose a new data-driven representation learning method that optimizes for a lower-dimensional linear subspace and a prediction model confined to that subspace. This method operates on a non-convex objective—that interpolates between predictability and stability—constrained on the Stiefel manifold, using an analog of projected gradient descent. We analyze the optimization landscape and prove that nearly all local optima align with an invariant linear subspace resilient to distribution shifts, provided sufficient regularization. This method incurs a nearly ideal gap between target and source risk. Empirical investigations on real-world data sets validate our method and theory; the tradeoffs between predictability and stability are elucidated.


Kulunu Dharmakeerthi, YoonHaeng Hur, and Tengyuan Liang. 2024. “Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction.” arXiv:2406.15904.

      title={Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction}, 
      author={Kulunu Dharmakeerthi and YoonHaeng Hur and Tengyuan Liang},