This repository contains the code for the paper:
Title: The Surprising Effectiveness of Randomness in LLM Pruning
Authors: Shuyao Xu, Jiayao Liu, ZhenFeng He, Cheng Peng, Weidi Xu
Conference: ICLR 2025 Workshop on Sparsity in LLMs (SLLM)
Paper URL: https://openreview.net/forum?id=YncWrbIxnN
This paper investigates structured pruning of LLMs. We find random pruning is a surprisingly effective baseline at lower pruning ratios. We propose Random Clustering + Activation L2 Pruning (RC+A), a simple and efficient method combining randomness with activation magnitude, achieving performance comparable to gradient-based methods while being significantly faster (up to 50x).
This code implements and evaluates various structured pruning techniques for LLM MLP layers, focusing on the effectiveness of randomness. Key implemented methods include:
- Random Pruning
- Activation L2 Pruning
- Taylor Pruning (Gradient-based)
- RC+A (Ours): Random Clustering + Activation L2 Pruning
- Similarity Clustering + Activation L2 Pruning
We recommend using Conda:
conda env create -f environment.yaml
conda activate llm-neuron-compression
(Alternatively, use pip install -r environment.txt
after ensuring PyTorch with CUDA is installed.)
Experiments are run via scripts/run_experiment.py
, preferably launched with accelerate
.
Key Arguments:
--model
: Hugging Face model ID (e.g.,"Qwen/Qwen2.5-7B-Instruct"
).--method
: Pruning strategy (see below).--ratio
: Pruning ratio (e.g.,0.25
).--layers
: Comma-separated layer indices (e.g.,"5,6,7,...26"
) or"all"
.--eval-tasks
: Comma-separatedlm-eval-harness
tasks (e.g.,"wikitext,mmlu,hellaswag"
).--apply-mode prune
: Required for RC+A and Similarity+ActivationL2 methods.
Methods (--method
) corresponding to paper results:
random-prune
: Random Pruninggradient-magnitude
: Taylor Pruningsquared-magnitude
: Activation L2 Pruningweight-l2
: Weight L2 Pruningrandom-merge
: Use with--apply-mode prune
for RC+A (Ours)activation-merge
: Use with--apply-mode prune
for Similarity+ActivationL2 (Modulated-Act)post-activation-merge
: Use with--apply-mode prune
for Similarity+ActivationL2 (Post-Act)
Example: Running RC+A (Ours) at 25% Pruning on Qwen-7B
# Define layers and tasks (adjust as needed)
LAYERS_TO_PRUNE="5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26"
EVAL_TASKS="wikitext,mmlu,arc_easy,arc_challenge,winogrande,hellaswag,openbookqa"
accelerate launch scripts/run_experiment.py \
--model "Qwen/Qwen2.5-7B-Instruct" \
--method random-merge \
--apply-mode prune \
--ratio 0.25 \
--layers $LAYERS_TO_PRUNE \
--dataset bookcorpus --num-calib-samples 10 --calib-seq-len 128 \
--eval-tasks $EVAL_TASKS
(Adapt the --method
and --ratio
arguments for other experiments. See scripts/*.sh
for more examples.)
Evaluation uses lm-evaluation-harness
. Results (metrics, config, timing) are saved as JSON in results/<model_name>/<method>/logs/
.
If you find this work useful, please cite:
@inproceedings{
xu2025the,
title={The Surprising Effectiveness of Randomness in {LLM} Pruning},
author={Shuyao Xu and Jiayao Liu and Zhenfeng He and Cheng Peng and Weidi Xu},
booktitle={Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference},
year={2025},
url={https://openreview.net/forum?id=YncWrbIxnN}
}