Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Shuzhou Yuan; Ercong Nie; Bolei Ma; Michael Färber

doi:10.1109/IJCNN64981.2025.11228532

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Shuzhou Yuan - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden), Professur für Skalierbare Software-Architekturen für Data Analytics (ScaDS.AI Dresden/Leipzig) (Autor:in)
Ercong Nie - , Ludwig-Maximilians-Universität München (LMU), Munich Center for Machine Learning (MCML) (Autor:in)
Bolei Ma - , Ludwig-Maximilians-Universität München (LMU), Munich Center for Machine Learning (MCML) (Autor:in)
Michael Färber - , Professur für Skalierbare Software-Architekturen für Data Analytics (ScaDS.AI Dresden/Leipzig), Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Autor:in)

Abstract

Large Language Models (LLMs) demonstrate exceptional language understanding and generation capabilities by learning from context. Leveraging the strong in-context learning (ICL) abilities of LLMs, prompt-based fine-tuning has proven to be effective for enhancing the adaptability and alignment of LLMs, especially in low-data scenarios. However, the billions of parameters resulting from layer stacking in LLMs present significant computational challenges, limiting the practicality of fine-tuning. To tackle this problem, we explore the application of layer-wise model pruning in prompt-based fine-tuning of LLMs for few-shot learning scenarios. Our approach involves dropping certain model layers and fine-tuning the model with the remaining layers. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work aimed at mitigating the size constraints of LLMs while preserving their performance, thereby opening avenues for significantly more efficient use of LLMs.

Details

Originalsprache	Englisch
Titel	International Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
Herausgeber (Verlag)	Institute of Electrical and Electronics Engineers (IEEE)
Seiten	1-8
ISBN (elektronisch)	979-8-3315-1042-8
ISBN (Print)	979-8-3315-1043-5
Publikationsstatus	Veröffentlicht - 2025
Peer-Review-Status	Ja

Publikationsreihe

Reihe	Proceedings of the International Joint Conference on Neural Networks
ISSN	2161-4393

Konferenz

Titel	International Joint Conference on Neural Networks 2025
Untertitel	All Neural Network roads lead to Rome
Kurztitel	IJCNN 2025
Dauer	30 Juni - 5 Juli 2025
Webseite	https://www.inns.org/ijcnn-home https://2025.ijcnn.org/
Bekanntheitsgrad	Internationale Veranstaltung
Ort	Pontificia Università Gregoriana
Stadt	Rome
Land	Italien

Externe IDs

ORCID	/0000-0001-5458-8645/work/200631677

Forschungsportal der TU Dresden

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Beitragende

Abstract

Details

Publikationsreihe

Konferenz

Externe IDs

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter