Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Official repository for the AAAI2025 paper Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer [paper] [website].

In summary, SparseViT leverages the distinction between semantic and non-semantic features, enabling the model to adaptively extract non-semantic features that are more critical for image manipulation localization. This provides a novel approach to precisely identifying manipulated regions.

Dataset Preparation

(1) Since SparseViT was trained using the CAT-Net joint dataset, you need to download the combined dataset. The specific datasets include:

CASIA2.0
FantasticReality_v1
IMD_20
tampCOCO

（For more detailed information about the dataset, you can refer to CAT-Net.）

(2) The organization of the dataset, we have defined two types of Dataset classes:

json_dataset, To retrieve the input images and their corresponding ground truth from a JSON file, the format would typically look like this:

[
    [
        "/Dataset/CASIAv2/Tp/Tp_D_NRN_S_N_arc00013_sec00045_11700.jpg",
        "/Dataset/CASIAv2/Gt/Tp_D_NRN_S_N_arc00013_sec00045_11700_gt.png"
    ],
    [
        "/Dataset/CASIAv2/Au/Au_nat_30198.jpg",
        "Negative"
    ],
    ...
]
Note: "Negative" indicates a real image with no ground truth.

mani_dataset，Automatically loads images and their corresponding ground truth pairs from a directory. The directory structure should include：

Tp subdirectory（for storing input images）

Gt subdirectory（for storing ground truth）

File pairing is automatically completed using the os.listdir() function.

An example of the organization of mani_dataset is provided in the /images directory.

(3) Combined dataset configuration, organize each dataset into a JSON file in the following format:

[
    ["ManiDataset", "/mnt/data0/public_datasets/IML/CASIA2.0"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/FantasticReality_v1/FantasticReality.json"],
    ["ManiDataset", "/mnt/data0/public_datasets/IML/IMD_20_1024"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/sp_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/cm_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/bcm_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/bcmc_COCO_list.json"]
]

Configure the path to the organized JSON file in the data_path parameter within the train.sh file.

Train setup

1) Set up the coding environment

First, clone the repository:

git clone https://github.com/scu-zjz/SparseViT.git

Our environment

Ubuntu LTS 20.04.1

CUDA 11.5 + cudnn 8.4.0

Python 3.10

PyTorch 2.4

You should install the packages in requirements.txt

pip install -r requirements.txt

2) Download the Uniformer pretrained weights

Download the pretrained weights from Google Drive and place them in the checkpoint/train/pretrain directory.
Modify the pretrain_path in the train.sh file to the location of your Uniformer pre-trained model.

Test setup

1) Set up the coding environment

Consistent with "train".

2) Download our pretrained checkpoints

Download our pretrained checkpoints from Google Drive and place them in the checkpoint/test directory.

Scripts

This should be super easy! Simply run

For Train

sh train.sh

For Test

python main_test.py

Here we simply provide the basic training and testing for SparseViT. Of course, you can train and test SparseViT within our proposed IMDL-BenCo framework, as they are fully compatible.

Citation

If you find our code useful, please consider citing us and give us a star!

@inproceedings{su2025can,
  title={Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter-efficient image manipulation localization through spare-coding transformer},
  author={Su, Lei and Ma, Xiaochen and Zhu, Xuekang and Niu, Chaoqun and Lei, Zeyu and Zhou, Ji-Zhe},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={7},
  pages={7024--7032},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
checkpoint		checkpoint
images		images
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SparseViT.py		SparseViT.py
SparseViT_Mul.py		SparseViT_Mul.py
__init__.py		__init__.py
balanced_dataset.json		balanced_dataset.json
decoderhead.py		decoderhead.py
engine_train.py		engine_train.py
main_test.py		main_test.py
main_train.py		main_train.py
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Dataset Preparation

Train setup

Test setup

Scripts

Citation

Star History

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

scu-zjz/SparseViT

Folders and files

Latest commit

History

Repository files navigation

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Dataset Preparation

Train setup

Test setup

Scripts

Citation

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages