Skip to content

Latest commit

 

History

History
11 lines (6 loc) · 1.24 KB

File metadata and controls

11 lines (6 loc) · 1.24 KB

Mastering the Craft of Data Synthesis for {C}ode{LLM}s

Authors: Chen, Meng and Arthur, Philip and Feng, Qianyu and Hoang, Cong Duy Vu and Hong, Yu-Heng and Moghaddam, Mahdi Kazemi and Nezami, Omid and Nguyen, Duc Thien and Tangari, Gioacchino and Vu, Duy and Vu, Thanh and Johnson, Mark and Kenthapadi, Krishnaram and Dharmasiri, Don and Duong, Long and Li, Yuan-Fang

Abstract:

Large language models (LLMs) have shown impressive performance in \textit{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation. Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and taxonomy of these techniques, emphasizing recent advancements. We highlight key challenges, explore future research directions, and offer practical guidance for new researchers entering the field.

Link: Read Paper

Labels: general coding task, empirical study