Papers

Distributional Shrinkage II: Optimal Transport Denoisers with Higher-Order Scores

We revisit the classic signal denoising problem through the lens of optimal transport. We introduce a hierarchy of denoisers that are agnostic to the signal distribution, depending only on higher-order score functions of the noisy observations. Each denoiser is progressively refined using higher-order score functions, achieving better denoising quality measured by the Wasserstein metric. The limiting denoiser identifies the optimal transport map for signal denoising. Our results connect information geometry, optimal transport, and advanced combinatorics.

Distributional Shrinkage I: Universal Denoisers in Multi-Dimensions

Empirical Bayes tends to produce overly aggressive shrinkage as a denoiser. We introduce new denoisers that optimally shrink the distribution toward the true signal distribution with order-of-magnitude improvements. Unlike empirical Bayes denoiser, our denoisers are universal and agnostic to the signal and noise distributions. One immediate application of our distributional shrinkage theory is to enhance generative modeling: we can replace the stochastic backward diffusion process with optimal deterministic denoisers to achieve higher-order accuracy.

No-Regret Generative Modeling via Parabolic Monge-Ampère PDE

We introduce a novel generative modeling framework called parabolic Monge-Ampère PDE sampler. We establish theoretical guarantees for generative modeling through the lens of no-regret analysis, demonstrating that the iterates converge to the optimal Brenier map under a variety of step-size schedules. We derive a new Evolution Variational Inequality connecting geometry, transportation cost, and regret.

Denoising Diffusions with Optimal Transport: Localization, Curvature, and Multi-Scale Complexity

Adding noise is easy; what about denoising? Diffusion is easy; what about reverting a diffusion? We provide a fine-grained analysis of the diffuse-then-denoise process. We discover a notion of multi-scale curvature complexity that collectively determines the success or failure mode of probabilistic diffusion models.

Gaussianized Design Optimization for Covariate Balance in Randomized Experiments

This paper presents Gaussianized Design Optimization, a novel framework for optimally balancing covariates in experimental design.

A Convexified Matching Approach to Imputation and Individualized Inference

We introduce a new convexified matching method for missing value imputation and individualized inference inspired by computational optimal transport.

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. We propose a new domain adaptation method for observational data in the presence of confounding, and characterize the the stability and predictability tradeoff leveraging a structural causal model.

Randomization Inference When N Equals One

A statistical theory for N-of-1 experiments, where a unit serves as its own control and treatment in rapid interleaving time windows.

Detecting Weak Distribution Shifts via Displacement Interpolation

Detecting weak, systematic distribution shifts and quantitatively modeling individual, heterogeneous responses to policies or incentives have found increasing empirical applications in social and economic sciences. We propose a model for weak distribution shifts via displacement interpolation, drawing from the optimal transport theory.

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria

Blessings and curses of covariate shifts, directional convergece, and the connection to experimental design.

High-dimensional Asymptotics of Langevin Dynamics in Spiked Matrix Models

We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a path-wise characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is characterized in terms of a self-consistent system of integro-differential equations, usually referred to as the Crisanti-Horner-Sommers-Cugliandolo-Kurchan (CHSCK) equations in the spin glass literature.

Online Learning to Transport via the Minimal Selection Principle

Motivated by robust dynamic resource allocation in operations research, we study the Online Learning to Transport (OLT) problem where the decision variable is a probability measure, an infinite-dimensional object. We draw connections between online learning, optimal transport, and partial differential equations through an insight called the minimal selection principle, originally studied in the Wasserstein gradient flow setting by Ambrosio et al. (2005).

Reversible Gromov-Monge Sampler for Simulation-Based Inference

Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference.

Universal Prediction Band via Semi-Definite Programming

This paper proposes a computationally efficient method to construct nonparametric, heteroscedastic prediction bands for uncertainty quantification.

Interpolating Classifiers Make Few Mistakes

This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers.

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

We utilize a connection between compositional kernels and branching processes via Mehler’s formula to study deep neural networks. This new probabilistic insight provides us a novel perspective on the mathematical role of activation functions in compositional neural networks. We study the unscaled and rescaled limits of the compositional kernels and explore the different phases of the limiting behavior, as the compositional depth increases.

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives.

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces. Our upper bounds on the risk are of a multiple-descent shape. Empirical evidence supports our finding that minimum-norm interpolants in RKHS can exhibit this unusual non-monotonicity in sample size.

Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits

What are the provable benefits of the adaptive representation by neural networks compared to the pre-specified fixed basis representation in the classical nonparametric literature? We answer the above questions via a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the training process of neural networks.

Deep Neural Networks for Estimation and Inference

Can deep neural networks with standard archtectures estimate treatment effects and perform downstream uncertainty quantification tasks?