Number of papers: 13
- Authors: Lu, Yining and Wang, Dixuan and Li, Tianjian and Jiang, Dongwei and Khudanpur, Sanjeev and Jiang, Meng and Khashabi, Daniel
- Abstract: As LLMs become increasingly prevalent, it is interesting to consider how ``creative'' these models can be. From cognitive science, creativity consists of at least two key characteristics: \textit{convergent} thinking (purposefulness to achieve a given goal) and \textit{divergent} thinking (adaptability to explore new environments or constraints) (CITATION). In this work, we introduce a framework for quantifying LLM creativity that incorporates the two design ingredients: (1) We introduce DENIAL ...
- Link: Read Paper
- Labels: code generation, program synthesis, benchmark
- Authors: Liu, Junnan
- Abstract: Translation of natural language to first-order logical formula (NL-FOL) has recently gained significant attention for its critical role in logic-based NLP applications. Some studies attempt to utilize pretrained language models in a sequence-to-sequence manner for the NL-FOL task. However, these methods encounter challenges such as (1) inconsistency between the training and inference phases and (2) the data-intensive and resource-intensive finetuning process. This paper introduces a novel NL-FOL...
- Link: Read Paper
- Labels: code generation, program synthesis
- Authors: Wong, Kyle and Amayuelas, Alfonso and Pan, Liangming and Wang, William Yang
- Abstract: Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent use case is iterative code repair, where an LLM fixes an incorrect program by rationalizing about errors and generating new code. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation, but only study their benefits on high-resource languages like Python, and ignore low-resource languages like Perl. To address this gap of know...
- Link: Read Paper
- Labels: code generation, program repair, empirical study
- Authors: Chen, Meng and Arthur, Philip and Feng, Qianyu and Hoang, Cong Duy Vu and Hong, Yu-Heng and Moghaddam, Mahdi Kazemi and Nezami, Omid and Nguyen, Duc Thien and Tangari, Gioacchino and Vu, Duy and Vu, Thanh and Johnson, Mark and Kenthapadi, Krishnaram and Dharmasiri, Don and Duong, Long and Li, Yuan-Fang
- Abstract: Large language models (LLMs) have shown impressive performance in \textit{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation. Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and taxonomy of these techniques, emphasizing recent advancements. We highlight key challenges, explore f...
- Link: Read Paper
- Labels: general coding task, empirical study
- Authors: Hai, Nam Le and Nguyen, Dung Manh and Bui, Nghi D. Q.
- Abstract: CodeLLMs are widely used for code generation, yet their ability to handle repository-level dependencies remains underexplored. We introduce RepoExec, a benchmark for evaluating repository-level code generation, focusing on executability, functional correctness, and dependency utilization. Our study evaluates 18 models, revealing that retaining full dependency context yields the best performance, while smaller context sizes can be misleading. Pretrained LLMs excel in correctness but often reimple...
- Link: Read Paper
- Labels: code generation, program synthesis, agent design, prompt strategy, retrieval-augmented generation
- Authors: Maveli, Nickil and Vergari, Antonio and Cohen, Shay B
- Abstract: Code-LLMs, LLMs pre-trained on large code corpora, have shown great progress in learning rich representations of the structure and syntax of code, successfully using it to generate or classify code fragments. At the same time, understanding if they are able to do so because they capture code semantics, and how well, is still an open question. In this paper, we tackle this problem by introducing SeqCoBench, a benchmark for systematically assessing how Code-LLMs can capture code functional equival...
- Link: Read Paper
- Labels: static analysis, equivalence checking, empirical study, benchmark
{COAST}: Enhancing the Code Debugging Ability of {LLM}s through Communicative Agent Based Data Synthesis
- Authors: Yang, Weiqing and Wang, Hanbin and Liu, Zhenghao and Li, Xinze and Yan, Yukun and Wang, Shuo and Gu, Yu and Yu, Minghe and Liu, Zhiyuan and Yu, Ge
- Abstract: Code debugging is a vital stage of software development, essential for ensuring the reliability and performance of Large Language Models (LLMs) in the code generation task. Human debugging typically follows a multi-stage process, which includes Bug Localization, Bug Identification, Code Repair, and Code Recognition. However, existing code debugging benchmarks predominantly focus on the Code Repair stage, which offers only a limited perspective on evaluating the debugging capabilities of LLMs. In...
- Link: Read Paper
- Labels: program testing, debugging, agent design, code model, code model training
- Authors: Naik, Atharva and Alenius, Marcus and Fried, Daniel and Rose, Carolyn
- Abstract: The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff ). Furthermore, code review is a one-to-many problem, like generation and summarization, with many ``valid reviews'' for a diff. Thus, we develop CRScore {---} a reference-free metric to measure dimensions of review quality like conciseness, co...
- Link: Read Paper
- Labels: software maintenance and deployment, code review
- Authors: Liu, Xiangyan and Lan, Bo and Hu, Zhiyuan and Liu, Yang and Zhang, Zhicheng and Wang, Fei and Shieh, Michael Qizhe and Zhou, Wenmeng
- Abstract: Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. This challenge has prompted research on enhancing LLM-codebase interaction at a repository scale. Current solutions rely on similarity-based retrieval or manual tools and APIs, each with notable drawbacks. Similarity-based retrieval often has low recall in complex tasks, while manual tools and APIs are typically task-specific and require expert knowledge, red...
- Link: Read Paper
- Labels: agent design, prompt strategy, retrieval-augmented generation
- Authors: Li, Jierui and Le, Hung and Zhou, Yingbo and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen
- Abstract: Pretrained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with capabilities to self-refine and improve generated code autonomously. However, on challenging coding tasks with extremely large search space, current agentic approaches still struggle with multi-stage planning, generating, and debugging. To address this problem,...
- Link: Read Paper
- Labels: code generation, program synthesis, agent design, planning
- Authors: Chen, Minyu and Li, Guoqiang and Wu, Ling-I and Liu, Ruibang
- Abstract: Dead code introduces several challenges in software development, such as increased binary size and maintenance difficulties. It can also obscure logical errors and be exploited for obfuscation in malware. For LLM-based code-related tasks, dead code introduces vulnerabilities that can mislead these models, raising security concerns. Although modern compilers and IDEs offer dead code elimination, sophisticated patterns can bypass these tools. A universal approach that includes classification, loca...
- Link: Read Paper
- Labels: static analysis, bug detection, program optimization
- Authors: Si, Chenglei and Zhang, Yanzhe and Li, Ryan and Yang, Zhengyuan and Liu, Ruibo and Yang, Diyi
- Abstract: Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language models (MLLMs) directly convert visual designs into code implementations. In this work, we construct Design2Code {--} the first real-world benchmark for this task. Specifically, we manually curate 484 diverse real-world webpages as test cases and develop a set of ...
- Link: Read Paper
- Labels: code generation, program synthesis, benchmark
- Authors: Lee, Jaeseong and Han, Hojae and Kim, Jongyoon and Hwang, Seung-won and Kang, Naun and An, KyungJun and Jang, Sungho
- Abstract: Fine-tuning large language models (LLMs) for code generation is challenging due to computational costs and the underrepresentation of some programming languages (PLs) in pre-training. We propose PLEX, a lottery-ticket based parameter-efficient fine-tuning (PEFT) method that adapts LLMs to either well-supported and underrepresented PLs.During lottery ticket selection, PLEX employs a dual strategy: for well-represented PLs, it leverages the LLM{'}s full parametric knowledge by selecting from full ...
- Link: Read Paper
- Labels: code generation, program synthesis, code model, code model training