Skip to content

Tim-Siu/llm-random-prune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Surprising Effectiveness of Randomness in LLM Pruning

This repository contains the code for the paper:

Title: The Surprising Effectiveness of Randomness in LLM Pruning

Authors: Shuyao Xu, Jiayao Liu, ZhenFeng He, Cheng Peng, Weidi Xu

Conference: ICLR 2025 Workshop on Sparsity in LLMs (SLLM)

Paper URL: https://openreview.net/forum?id=YncWrbIxnN

Abstract

This paper investigates structured pruning of LLMs. We find random pruning is a surprisingly effective baseline at lower pruning ratios. We propose Random Clustering + Activation L2 Pruning (RC+A), a simple and efficient method combining randomness with activation magnitude, achieving performance comparable to gradient-based methods while being significantly faster (up to 50x).

Overview

This code implements and evaluates various structured pruning techniques for LLM MLP layers, focusing on the effectiveness of randomness. Key implemented methods include:

  • Random Pruning
  • Activation L2 Pruning
  • Taylor Pruning (Gradient-based)
  • RC+A (Ours): Random Clustering + Activation L2 Pruning
  • Similarity Clustering + Activation L2 Pruning

Installation

We recommend using Conda:

conda env create -f environment.yaml
conda activate llm-neuron-compression

(Alternatively, use pip install -r environment.txt after ensuring PyTorch with CUDA is installed.)

Usage

Experiments are run via scripts/run_experiment.py, preferably launched with accelerate.

Key Arguments:

  • --model: Hugging Face model ID (e.g., "Qwen/Qwen2.5-7B-Instruct").
  • --method: Pruning strategy (see below).
  • --ratio: Pruning ratio (e.g., 0.25).
  • --layers: Comma-separated layer indices (e.g., "5,6,7,...26") or "all".
  • --eval-tasks: Comma-separated lm-eval-harness tasks (e.g., "wikitext,mmlu,hellaswag").
  • --apply-mode prune: Required for RC+A and Similarity+ActivationL2 methods.

Methods (--method) corresponding to paper results:

  • random-prune: Random Pruning
  • gradient-magnitude: Taylor Pruning
  • squared-magnitude: Activation L2 Pruning
  • weight-l2: Weight L2 Pruning
  • random-merge: Use with --apply-mode prune for RC+A (Ours)
  • activation-merge: Use with --apply-mode prune for Similarity+ActivationL2 (Modulated-Act)
  • post-activation-merge: Use with --apply-mode prune for Similarity+ActivationL2 (Post-Act)

Example: Running RC+A (Ours) at 25% Pruning on Qwen-7B

# Define layers and tasks (adjust as needed)
LAYERS_TO_PRUNE="5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26"
EVAL_TASKS="wikitext,mmlu,arc_easy,arc_challenge,winogrande,hellaswag,openbookqa"

accelerate launch scripts/run_experiment.py \
    --model "Qwen/Qwen2.5-7B-Instruct" \
    --method random-merge \
    --apply-mode prune \
    --ratio 0.25 \
    --layers $LAYERS_TO_PRUNE \
    --dataset bookcorpus --num-calib-samples 10 --calib-seq-len 128 \
    --eval-tasks $EVAL_TASKS

(Adapt the --method and --ratio arguments for other experiments. See scripts/*.sh for more examples.)

Evaluation & Results

Evaluation uses lm-evaluation-harness. Results (metrics, config, timing) are saved as JSON in results/<model_name>/<method>/logs/.

Citation

If you find this work useful, please cite:

@inproceedings{
xu2025the,
title={The Surprising Effectiveness of Randomness in {LLM} Pruning},
author={Shuyao Xu and Jiayao Liu and Zhenfeng He and Cheng Peng and Weidi Xu},
booktitle={Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference},
year={2025},
url={https://openreview.net/forum?id=YncWrbIxnN}
}

About

ICLR 25 SLLM: The Surprising Effectiveness of Randomness in LLM Pruning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published