VideoArxiv

Talk page

Title:

Mean field theory of neural networks: From stochastic gradient descent to Wasserstein gradient flows

Speaker:

Abstract:

Modern neural networks contain millions of parameters, and training them requires to optimize a highly non-convex objective. Despite the apparent complexity of this task, practitioners successfully train such models using simple first order methods such as stochastic gradient descent (SGD). I will survey recent efforts to understand this surprising phenomenon using tools from the theory of partial differential equations. Namely, I will discuss a mean field limit in which the number of neurons becomes large, and the SGD dynamics is approximated by a certain Wasserstein gradient flow. [Joint work with Adel Javanmard, Song Mei, Theodor Misiakiewicz, Marco Mondelli, Phan-Minh Nguyen]

Link:

https://www.msri.org/workshops/928/schedules/28405

Workshop:

MSRI- [Moved Online] Hot Topics: Optimal transport and applications to machine learning and statistics