Skip to content

Official repository for the AAAI2025 paper (Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer)

License

Notifications You must be signed in to change notification settings

scu-zjz/SparseViT

Repository files navigation

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Official repository for the AAAI2025 paper Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer [paper] [website].

SparseViT Framework Diagram

In summary, SparseViT leverages the distinction between semantic and non-semantic features, enabling the model to adaptively extract non-semantic features that are more critical for image manipulation localization. This provides a novel approach to precisely identifying manipulated regions.

Dataset Preparation

Dataset Preparation
(1) Since SparseViT was trained using the CAT-Net joint dataset, you need to download the combined dataset. The specific datasets include:
  • CASIA2.0
  • FantasticReality_v1
  • IMD_20
  • tampCOCO

(For more detailed information about the dataset, you can refer to CAT-Net.)

(2) The organization of the dataset, we have defined two types of Dataset classes:

  • json_dataset, To retrieve the input images and their corresponding ground truth from a JSON file, the format would typically look like this:
[
    [
        "/Dataset/CASIAv2/Tp/Tp_D_NRN_S_N_arc00013_sec00045_11700.jpg",
        "/Dataset/CASIAv2/Gt/Tp_D_NRN_S_N_arc00013_sec00045_11700_gt.png"
    ],
    [
        "/Dataset/CASIAv2/Au/Au_nat_30198.jpg",
        "Negative"
    ],
    ...
]
Note: "Negative" indicates a real image with no ground truth.
  • mani_dataset,Automatically loads images and their corresponding ground truth pairs from a directory. The directory structure should include:

    Tp subdirectory(for storing input images)

    Gt subdirectory(for storing ground truth)

    File pairing is automatically completed using the os.listdir() function.

    An example of the organization of mani_dataset is provided in the /images directory.

(3) Combined dataset configuration, organize each dataset into a JSON file in the following format:

[
    ["ManiDataset", "/mnt/data0/public_datasets/IML/CASIA2.0"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/FantasticReality_v1/FantasticReality.json"],
    ["ManiDataset", "/mnt/data0/public_datasets/IML/IMD_20_1024"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/sp_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/cm_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/bcm_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/bcmc_COCO_list.json"]
]

Configure the path to the organized JSON file in the data_path parameter within the train.sh file.

Train setup

1) Set up the coding environment
  • First, clone the repository:
git clone https://github.com/scu-zjz/SparseViT.git
  • Our environment
Ubuntu LTS 20.04.1

CUDA 11.5 + cudnn 8.4.0

Python 3.10

PyTorch 2.4
pip install -r requirements.txt
2) Download the Uniformer pretrained weights
  • Download the pretrained weights from Google Drive and place them in the checkpoint/train/pretrain directory.
  • Modify the pretrain_path in the train.sh file to the location of your Uniformer pre-trained model.

Test setup

1) Set up the coding environment
  • Consistent with "train".
2) Download our pretrained checkpoints
  • Download our pretrained checkpoints from Google Drive and place them in the checkpoint/test directory.

Scripts

This should be super easy! Simply run

  • For Train
sh train.sh
  • For Test
python main_test.py

Here we simply provide the basic training and testing for SparseViT. Of course, you can train and test SparseViT within our proposed IMDL-BenCo framework, as they are fully compatible.

Citation

If you find our code useful, please consider citing us and give us a star!

@inproceedings{su2025can,
  title={Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter-efficient image manipulation localization through spare-coding transformer},
  author={Su, Lei and Ma, Xiaochen and Zhu, Xuekang and Niu, Chaoqun and Lei, Zeyu and Zhou, Ji-Zhe},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={7},
  pages={7024--7032},
  year={2025}
}

Star History

Star History Chart

About

Official repository for the AAAI2025 paper (Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •