WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

Authors: Yu, Zhaojian and Zhang, Xin and Shang, Ning and Huang, Yangyu and Xu, Can and Zhao, Yishujie and Hu, Wenxiang and Yin, Qiufeng

Abstract:

Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs mainly focus on the traditional code generation task, resulting in poor performance in complex multi-task scenarios. In this paper, we concentrate on multiple code-related tasks and present WaveCoder, a series of Code LLMs trained with Widespread And Versatile Enhanced instruction data. To enable the models to tackle complex code-related tasks, we propose a method to stably generate diverse, high-quality instruction data from open source code dataset in multi-task scenarios and obtain CodeOcean, a dataset comprising 19,915 instruction instances across 4 code-related tasks, which is aimed at improving the generalization ability of Code LLM. Our experiments demonstrate that WaveCoder models significantly outperform other open-source models in terms of the generalization ability across different code-related tasks. Moreover, WaveCoder-Ultra-6.7B presents the state-of-the-art generalization abilities on a wide range of code-related tasks.

Link: Read Paper

Labels: general coding task, code model, code model training, source code model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

FilesExpand file tree

paper_17.md

Latest commit

History

paper_17.md

File metadata and controls

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning