A Convexified Matching Approach to Imputation and Individualized Inference
We introduce a new convexified matching method for missing value imputation and individualized inference inspired by computational optimal transport.
We introduce a new convexified matching method for missing value imputation and individualized inference inspired by computational optimal transport.
Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. We propose a new domain adaptation method for observational data in the presence of confounding, and characterize the the stability and predictability tradeoff leveraging a structural causal model.
A statistical theory for N-of-1 experiments, where a unit serves as its own control and treatment in different time windows.
Detecting weak, systematic distribution shifts and quantitatively modeling individual, heterogeneous responses to policies or incentives have found increasing empirical applications in social and economic sciences. We propose a model for weak distribution shifts via displacement interpolation, drawing from the optimal transport theory.
Blessings and curses of covariate shifts, directional convergece, and the connection to experimental design.
We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a path-wise characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is characterized in terms of a self-consistent system of integro-differential equations, usually referred to as the Crisanti-Horner-Sommers-Cugliandolo-Kurchan (CHSCK) equations in the spin glass literature.
Motivated by robust dynamic resource allocation in operations research, we study the Online Learning to Transport (OLT) problem where the decision variable is a probability measure, an infinite-dimensional object. We draw connections between online learning, optimal transport, and partial differential equations through an insight called the minimal selection principle, originally studied in the Wasserstein gradient flow setting by Ambrosio et al. (2005).
Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference.
This paper proposes a computationally efficient method to construct nonparametric, heteroscedastic prediction bands for uncertainty quantification.
This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers.
We utilize a connection between compositional kernels and branching processes via Mehler’s formula to study deep neural networks. This new probabilistic insight provides us a novel perspective on the mathematical role of activation functions in compositional neural networks. We study the unscaled and rescaled limits of the compositional kernels and explore the different phases of the limiting behavior, as the compositional depth increases.
This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives.
We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces. Our upper bounds on the risk are of a multiple-descent shape. Empirical evidence supports our finding that minimum-norm interpolants in RKHS can exhibit this unusual non-monotonicity in sample size.
What are the provable benefits of the adaptive representation by neural networks compared to the pre-specified fixed basis representation in the classical nonparametric literature? We answer the above questions via a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the training process of neural networks.
Can deep neural networks with standard archtectures estimate treatment effects and perform downstream uncertainty quantification tasks?
In the absence of explicit regularization, interpolating kernel machine has the potential to fit the training data perfectly, at the same time, still generalizes well on test data. We isolate a phenomenon of implicit regularization for minimum-norm interpolated solutions.
We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003).
This paper studies the rates of convergence for learning distributions implicitly with the adversarial framework and Generative Adversarial Networks (GANs), which subsume Wasserstein, Sobolev, MMD GAN, and Generalized/Simulated Method of Moments (GMM/SMM) as special cases. We study a wide range of parametric and nonparametric target distributions under a host of objective evaluation metrics. We investigate how to obtain valid statistical guarantees for GANs through the lens of regularization.
Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint only informs us how well the solution is approximated numerically but overlooks the sampling nature of the data. We introduce the moment-adjusted stochastic gradient descents, a new stochastic optimization method for statistical inference.