VideoArxiv

Talk page

Title:

Deep learning and operator-valued free probability: training and generalization dynamics in high dimensions. Part II

Speaker:

Abstract:

One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has recently inspired a broad research effort on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this talk, I will present a formalism based on operator-valued free probability that enables exact predictions of training and generalization performance in the high-dimensional regime in which both the dataset size and the number of features tend to infinity. The analysis provides one of the first analytically tractable models that captures the effects of early stopping, over/under-parameterization, explicit regularization, and whose performance exhibits complex non-monotonic behavior as the number of parameters is varied.

Link:

http://scgp.stonybrook.edu/video_portal/video.php?id=4382

Workshop:

Simons- Program: Neural networks and the Data Science Revolution: from theoretical physics to neuroscience, and back