Emergence of heavy tails in homogenized stochastic gradient descent

Zhe Jiao; Martin Keller-Ressel

Emergence of heavy tails in homogenized stochastic gradient descent

Publikation: Beitrag in Fachzeitschrift › Konferenzartikel › Beigetragen › Begutachtung

Beitragende

Zhe Jiao - , Northwestern Polytechnical University Xian (Autor:in)
Martin Keller-Ressel - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden), Professur für Stochastische Analysis und Finanzmathematik (Autor:in)

Abstract

It has repeatedly been observed that loss minimization by stochastic gradient descent (SGD) leads to heavy-tailed distributions of neural network parameters. Here, we analyze a continuous diffusion approximation of SGD, called homogenized stochastic gradient descent, and show in a regularized linear regression framework that it leads to an asymptotically heavy-tailed parameter distribution, even though local gradient noise is Gaussian. We give explicit upper and lower bounds on the tail-index of the resulting parameter distribution and validate these bounds in numerical experiments. Moreover, the explicit form of these bounds enables us to quantify the interplay between optimization hyperparameters and the tail-index. Doing so, we contribute to the ongoing discussion on links between heavy tails and the generalization performance of neural networks as well as the ability of SGD to avoid suboptimal local minima.

Details

Originalsprache	Englisch
Seiten (von - bis)	14066-14092
Seitenumfang	27
Fachzeitschrift	Advances in Neural Information Processing Systems
Jahrgang	37
Publikationsstatus	Veröffentlicht - 2024
Peer-Review-Status	Ja

Konferenz

Titel	38th Conference on Neural Information Processing Systems
Kurztitel	NeurIPS 2024
Veranstaltungsnummer	38
Dauer	9 - 15 Dezember 2024
Webseite	https://neurips.cc/Conferences/2024 https://neurips.cc/virtual/2024/index.html
Ort	Vancouver Convention Center & Online
Stadt	Vancouver
Land	Kanada

Externe IDs

ORCID	/0000-0003-0913-3363/work/192581534

Schlagworte

ASJC Scopus Sachgebiete

Computernetzwerke und -kommunikation
Information systems
Signalverarbeitung

Forschungsportal der TU Dresden