arxiv:2604.19254

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

Published on Apr 21

· Submitted by

SeanLee on Apr 22

The Hong Kong Polytechnic University

Upvote

Authors:

Xianming Li ,

Zongxi Li ,

Abstract

ShadowPEFT is a parameter-efficient fine-tuning framework that performs layer-level refinement through depth-shared shadow modules, offering competitive performance with reduced computational overhead compared to traditional low-rank adaptation methods.

AI-generated summary

Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserting independent low-rank perturbations directly to individual weights, resulting in a local parameterization of adaptation. We propose ShadowPEFT, a centralized PEFT framework that instead performs layer-level refinement through a depth-shared shadow module. At each transformer layer, ShadowPEFT maintains a parallel shadow state and evolves it repeatedly for progressively richer hidden states. This design shifts adaptation from distributed weight-space perturbations to a shared layer-space refinement process. Since the shadow module is decoupled from the backbone, it can be reused across depth, independently pretrained, and optionally deployed in a detached mode, benefiting edge computing scenarios. Experiments on generation and understanding benchmarks show that ShadowPEFT matches or outperforms LoRA and DoRA under comparable trainable-parameter budgets. Additional analyses on shadow pretraining, cross-dataset transfer, parameter scaling, inference latency, and system-level evaluation suggest that centralized layer-space adaptation is a competitive and flexible alternative to conventional low-rank PEFT.

View arXiv page View PDF Project page GitHub 15 Add to collection

Community

SeanLee97

Paper author Paper submitter 1 day ago

•

edited about 22 hours ago

We introduced ShadowPEFT, a new Parameter-Efficient Fine-Tuning (PEFT) paradigm tailored for edge computing scenarios.

Unlike traditional approaches such as LoRA and its variants, which inject trainable parameters directly into the weights of Transformer, requiring tight coupling with the backbone.

ShadowPEFT instead enhances the frozen large base model by adding a lightweight, centralized, pretrainable, and detachable Shadow network.
This shadow network operates in parallel with the base model, delivering learned corrections to each decoder layer. Because the shadow module is architecturally decoupled from the backbone, it can be independently trained, stored, and deployed, benefiting edge computing scenarios and edge-cloud collaboration computing.

Arxiv: https://arxiv.org/abs/2604.19254
GitHub: https://github.com/ShadowLLM/shadow-peft
HF Collection: https://huggingface.co/collections/shadow-llm/shadow-peft-models

SeanLee97

Paper author Paper submitter about 21 hours ago

This comment has been hidden (marked as Resolved)

csroyli

Paper author about 21 hours ago

这项工作的核心出发点，是重新思考当前主流 PEFT 方法的适配方式。以 LoRA 为代表的方法，通常通过在多个线性层中注入彼此独立的低秩更新来实现下游适配；从机制上看，这是一种相对分散的、局部的 weight-space parameterization。在这项工作中，我们尝试探索另一种可能：将适配过程从分布式的权重扰动，转向集中式的 layer-level representation refinement。

基于这一想法，我们提出了 ShadowPEFT。该框架在冻结 backbone 的基础上，引入一个可跨层复用的 shadow network，并在层深方向维护一个并行演化的 shadow state。在每一层中，模型通过 Shadow Injection、Base Encoding 与 Shadow Update 三个步骤，对 backbone 的隐藏表示进行持续 refinement。相比于传统低秩方法对局部权重的独立修正，ShadowPEFT 更强调一种共享、状态化、跨层协调的适配机制。

实验结果表明，在 Qwen3 0.6B / 4B / 8B 等不同规模 backbone 上，ShadowPEFT 在与 LoRA / DoRA 可比的训练参数预算下，取得了具有竞争力、并在平均指标上更优的性能表现。更重要的是，由于 shadow 模块与 backbone 在结构上是解耦的，它不仅能够参与 attached 模式下的完整推理，也支持 detached deployment，从而为 edge/cloud 场景下的灵活部署提供了新的可能。

我们也进一步考察了 shadow pretraining 的作用。结果显示，当为 Qwen3 8B 配置一个经过预训练的 0.5B shadow model 时，整体性能能够进一步提升；同时，该 shadow 在 detached 模式下仍保留了较强的独立能力。这一现象说明，shadow 模块并不仅仅是一个附着式的适配器，更可以被视作一种可迁移、可复用、可独立部署的功能性适配单元。

此外，在参数规模扩展实验中，我们观察到 ShadowPEFT 对额外参数容量的利用方式，与传统低秩 PEFT 方法存在明显差异：相较于 LoRA 的相对平稳和 DoRA 在更高参数规模下的退化趋势，ShadowPEFT 能够更稳定地从更大的 shadow 模块中获益。这也提示我们，PEFT 的能力扩展未必只能依赖 rank increase，也可以通过集中式功能模块的扩展来实现。

总体而言，这项工作希望说明：
PEFT 不仅可以被理解为轻量级参数注入，也可以被设计为一种模块化、状态化、可拆卸的函数级适配机制。

csroyli

Paper author about 21 hours ago

This comment has been hidden (marked as Resolved)

avahal

about 7 hours ago

Interesting breakdown of this paper on arXivLens: https://arxivlens.com/PaperView/Details/shadowpeft-shadow-network-for-parameter-efficient-fine-tuning-9520-dca778f6
Covers the executive summary, detailed methodology, and practical applications.

librarian-bot

about 3 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

csroyli

Paper author about 1 hour ago

here is a demo of ShadowPEFT deployment on #Unitree Go2 dog. With a 0.5B shadow model deployed, the dog can understand commands with NVIDIA Jetson Orin GPU and perform actions within two seconds.
new option for #embodied #AI.
(speech recognition done by iPhone)