Statistics Theory (math.ST)

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. We propose a new domain adaptation method for observational data in the presence of confounding, and characterize the the stability and predictability tradeoff leveraging a structural causal model.

BUSN 41918 (PhD): Data, Learning, and Algorithms

This Ph.D. level course will provide an overview of machine learning and its algorithmic paradigms, and explore recent topics on learning, inference, and decision-making with large data sets. Emphasis will be made on theoretical insights and algorithmic principles.

Randomization Inference When N Equals One

A statistical theory for N-of-1 experiments, where a unit serves as its own control and treatment in different time windows.

Detecting Weak Distribution Shifts via Displacement Interpolation

Detecting weak, systematic distribution shifts and quantitatively modeling individual, heterogeneous responses to policies or incentives have found increasing empirical applications in social and economic sciences. We propose a model for weak distribution shifts via displacement interpolation, drawing from the optimal transport theory.

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria

Blessings and curses of covariate shifts, directional convergece, and the connection to experimental design.

High-dimensional Asymptotics of Langevin Dynamics in Spiked Matrix Models

We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a path-wise characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is characterized in terms of a self-consistent system of integro-differential equations, usually referred to as the Crisanti-Horner-Sommers-Cugliandolo-Kurchan (CHSCK) equations in the spin glass literature.

Reversible Gromov-Monge Sampler for Simulation-Based Inference

Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference.

Universal Prediction Band via Semi-Definite Programming

This paper proposes a computationally efficient method to construct nonparametric, heteroscedastic prediction bands for uncertainty quantification.

Interpolating Classifiers Make Few Mistakes

This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers.

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

We utilize a connection between compositional kernels and branching processes via Mehler’s formula to study deep neural networks. This new probabilistic insight provides us a novel perspective on the mathematical role of activation functions in compositional neural networks. We study the unscaled and rescaled limits of the compositional kernels and explore the different phases of the limiting behavior, as the compositional depth increases.

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives.

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces. Our upper bounds on the risk are of a multiple-descent shape. Empirical evidence supports our finding that minimum-norm interpolants in RKHS can exhibit this unusual non-monotonicity in sample size.

Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits

What are the provable benefits of the adaptive representation by neural networks compared to the pre-specified fixed basis representation in the classical nonparametric literature? We answer the above questions via a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the training process of neural networks.

Deep Neural Networks for Estimation and Inference

Can deep neural networks with standard archtectures estimate treatment effects and perform downstream uncertainty quantification tasks?

Just Interpolate: Kernel Ridgeless Regression Can Generalize

In the absence of explicit regularization, interpolating kernel machine has the potential to fit the training data perfectly, at the same time, still generalizes well on test data. We isolate a phenomenon of implicit regularization for minimum-norm interpolated solutions.

How Well Generative Adversarial Networks Learn Distributions

This paper studies the rates of convergence for learning distributions implicitly with the adversarial framework and Generative Adversarial Networks (GANs), which subsume Wasserstein, Sobolev, MMD GAN, and Generalized/Simulated Method of Moments (GMM/SMM) as special cases. We study a wide range of parametric and nonparametric target distributions under a host of objective evaluation metrics. We investigate how to obtain valid statistical guarantees for GANs through the lens of regularization.

Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients

Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint only informs us how well the solution is approximated numerically but overlooks the sampling nature of the data. We introduce the moment-adjusted stochastic gradient descents, a new stochastic optimization method for statistical inference.