Bayesian Sparsification for Deep Neural Networks With Bayesian Model Reduction

Dimitrije Markovic; Karl J. Friston; Stefan J. Kiebel

doi:10.1109/ACCESS.2024.3417219

Bayesian Sparsification for Deep Neural Networks With Bayesian Model Reduction

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Dimitrije Markovic - , Professur für Kognitive computationale Neurowissenschaft, Technische Universität Dresden (Autor:in)
Karl J. Friston - , VERSES AI , University College London (Autor:in)
Stefan J. Kiebel - , Professur für Kognitive computationale Neurowissenschaft, Technische Universität Dresden (Autor:in)

Abstract

Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art - in Bayesian sparsification of deep neural networks - combines structural shrinkage priors on model weights with an approximate inference scheme based on stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the advantages of the BMR method relative to established approaches, which are based on hierarchical horseshoe priors over model weights. We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.

Details

Originalsprache	Englisch
Seiten (von - bis)	88231-88242
Seitenumfang	12
Fachzeitschrift	IEEE access
Jahrgang	12 (2024)
Publikationsstatus	Veröffentlicht - 20 Juni 2024
Peer-Review-Status	Ja

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Bayesian model reduction, deep neural networks, stochastic variational inference

Bibliotheksschlagworte

150 Psychologie

Forschungsportal der TU Dresden