Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Shuzhou Yuan; Ercong Nie; Bolei Ma; Michael Färber

doi:10.1109/IJCNN64981.2025.11228532

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Shuzhou Yuan - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden), Chair of Scalable Software Architectures for Data Analytics (ScaDS.AI Dresden/Leipzig) (Author)
Ercong Nie - , Ludwig Maximilian University of Munich, Munich Center for Machine Learning (MCML) (Author)
Bolei Ma - , Ludwig Maximilian University of Munich, Munich Center for Machine Learning (MCML) (Author)
Michael Färber - , Chair of Scalable Software Architectures for Data Analytics (ScaDS.AI Dresden/Leipzig), Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)

Abstract

Large Language Models (LLMs) demonstrate exceptional language understanding and generation capabilities by learning from context. Leveraging the strong in-context learning (ICL) abilities of LLMs, prompt-based fine-tuning has proven to be effective for enhancing the adaptability and alignment of LLMs, especially in low-data scenarios. However, the billions of parameters resulting from layer stacking in LLMs present significant computational challenges, limiting the practicality of fine-tuning. To tackle this problem, we explore the application of layer-wise model pruning in prompt-based fine-tuning of LLMs for few-shot learning scenarios. Our approach involves dropping certain model layers and fine-tuning the model with the remaining layers. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work aimed at mitigating the size constraints of LLMs while preserving their performance, thereby opening avenues for significantly more efficient use of LLMs.

Details

Original language	English
Title of host publication	International Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
ISBN (electronic)	9798331510428
Publication status	Published - 2025
Peer-reviewed	Yes

Publication series

Series	Proceedings of the International Joint Conference on Neural Networks
ISSN	2161-4393

Conference

Title	International Joint Conference on Neural Networks 2025
Subtitle	All Neural Network roads lead to Rome
Abbreviated title	IJCNN 2025
Duration	30 June - 5 July 2025
Website	https://www.inns.org/ijcnn-home https://2025.ijcnn.org/
Degree of recognition	International event
Location	Pontificia Università Gregoriana
City	Rome
Country	Italy

External IDs

ORCID	/0000-0001-5458-8645/work/200631677

Research Portal of the TU Dresden

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Contributors

Abstract

Details

Publication series

Conference

External IDs

Keywords

ASJC Scopus subject areas

Keywords