With the rapid development of Multimodal Large Language Models (MLLMs), their potential in Chinese Classical Studies (CCS), a field which plays a vital role in preserving and promoting China’s rich cultural heritage, remains largely unexplored due to the absence of specialized benchmarks. To bridge this gap, we propose MCS-Bench, the first-of-its-kind multimodal benchmark specifically designed for CCS across multiple subdomains. MCS-Bench spans seven core subdomains (Ancient Chinese Text, Calligraphy, Painting, Oracle Bone Script, Seal, Cultural Relic, and Illustration), with a total of 45 meticulously designed tasks. Through extensive evaluation of 37 representative MLLMs, we observe that even the top-performing model (InternVL2.5-78B) achieves an average score below 50, indicating substantial room for improvement. Our analysis reveals significant performance variations across different tasks and identifies critical challenges in areas such as Optical Character Recognition (OCR) and cultural context interpretation. MCS-Bench not only establishes a standardized baseline for CCS-focused MLLM research but also provides valuable insights for advancing cultural heritage preservation and innovation in the Artificial General Intelligence (AGI) era.
Figure 2 showcases examples from seven subdomains in MCS-Bench.
The MCS-Bench dataset is only available for non-commercial research purposes. Scholars or organizations interested in using the MCS-Bench dataset are required to fill out this application form and send it to us via email. When submitting the application form, please list or attach 1-2 papers you have published in the past 6 years to demonstrate that you (or your team) are conducting research in the field of Classical Chinese Studies. Once we receive and approve your application, we will provide a download link and extraction password. All users must comply with the usage terms; failure to do so will result in the revocation of authorization.
Calculate metrics based on the output of LLMs:
python calculate_metrics.py
The work is licensed under a MIT License.
The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Important Notice:
The original data of this dataset are collected from publicly accessible sources such as the Internet, and the copyright remains with the original content providers. The curated and annotated dataset reported in this case is intended for non-commercial use only and is currently licensed exclusively to universities and research institutions. If you wish to apply for access to this dataset, please complete the required application form in accordance with the instructions provided on the dataset website. The signature section of the application must be signed by a full-time staff member of a university or research institute. Where possible, please affix an official institutional seal (a seal from a secondary-level unit is acceptable) to facilitate the review and approval process.



