Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients


Modern statistical inference tasks often require iterative optimization methods to approximate the solution. Convergence analysis from optimization only tells us how well we are approximating the solution deterministically, but overlooks the sampling nature of the data. However, due to the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the answer obtained after certain steps of optimization. Therefore, it is important yet challenging to understand the sampling distribution of the iterative optimization methods. This paper makes some progress along this direction by introducing a new stochastic optimization method for statistical inference, the moment adjusted stochastic gradient descent. We establish non-asymptotic theory that characterizes the statistical distribution of the iterative methods, with good optimization guarantee. On the statistical front, the theory allows for model misspecification, with very mild conditions on the data. For optimization, the theory is flexible for both the convex and non-convex cases. Remarkably, the moment adjusting idea motivated from “error standardization” in statistics achieves similar effect as Nesterov’s acceleration in optimization, for certain convex problems as in fitting generalized linear models. We also demonstrate this acceleration effect in the non-convex setting through experiments.

Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.81 (2), 431-456