Skip to content

Commit 18d2528

Browse files
author
peijieyu
committed
Update.
1 parent 6d0832b commit 18d2528

File tree

2 files changed

+26
-2
lines changed

2 files changed

+26
-2
lines changed

README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
📖 <a>English</a> •
66
<a href="README_ZH.md">中文</a>
77
<br>
8-
🤗 <a href="https://huggingface.co/datasets/tencent/C3-BenchMark">Dataset</a>
8+
🤗 <a href="https://huggingface.co/datasets/tencent/C3-BenchMark">Dataset</a> •
9+
📚 <a href="https://arxiv.org/abs/2505.18746">Preprint Paper</a>
910
</p>
1011

1112

@@ -348,3 +349,14 @@ Planner:getWaifuDetails(image_id=778899)
348349
```
349350

350351
It should be noted that even though our framework is capable of generating such excellent true multi-turn tasks, the generation of true multi-turn tasks remains very challenging for LLMs. Therefore, as we mentioned earlier, it was through the manual annotation by multiple experts that the accuracy was increased from less than 60% to 100%. This also includes modifying the pseudo multi-turn tasks generated by LLMs into true multi-turn tasks.
352+
353+
## 🔎 Citation
354+
```
355+
@article{yu2025c3benchthingsrealdisturbing,
356+
title={$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking},
357+
author={Peijie Yu and Yifan Yang and Jinjian Li and Zelong Zhang and Haorui Wang and Xiao Feng and Feng Zhang},
358+
year={2025},
359+
journal={arXiv preprint arXiv:2505.18746},
360+
url={https://arxiv.org/abs/2505.18746}
361+
}
362+
```

README_ZH.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
📖 <a href="README.md">English</a> •
66
<a>中文</a>
77
<br>
8-
🤗 <a href="https://huggingface.co/datasets/tencent/C3-BenchMark">Dataset</a>
8+
🤗 <a href="https://huggingface.co/datasets/tencent/C3-BenchMark">数据集</a> •
9+
📚 <a href="https://arxiv.org/abs/2505.18746">预印版论文</a>
910
</p>
1011

1112

@@ -343,3 +344,14 @@ Planner:getWaifuDetails(image_id=778899)
343344
```
344345

345346
需要说明的是,即便我们的框架能够生成这样优秀的真多轮任务,但是真多轮任务生成对LLM来说仍十分困难,因此我们前面提到的通过多位专家人工标注,才将准确率从不足60%提升到100%,这也包括将LLM生成的伪多轮任务修改为真多轮任务。
347+
348+
## 🔎 Citation
349+
```
350+
@article{yu2025c3benchthingsrealdisturbing,
351+
title={$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking},
352+
author={Peijie Yu and Yifan Yang and Jinjian Li and Zelong Zhang and Haorui Wang and Xiao Feng and Feng Zhang},
353+
year={2025},
354+
journal={arXiv preprint arXiv:2505.18746},
355+
url={https://arxiv.org/abs/2505.18746}
356+
}
357+
```

0 commit comments

Comments
 (0)