EMNLP2023

Number of papers: 18

API-Assisted Code Generation for Question Answering on Varied Table Structures

Authors: Cao, Yihan and Chen, Shuyi and Liu, Ryan and Wang, Zhiruo and Fried, Daniel
Abstract: A persistent challenge to table question answering (TableQA) by generating executable programs has been adapting to varied table structures, typically requiring domain-specific logical forms. In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying language, and (3) uses few-shot prompting to translate NL questions into Python programs, which are execu...
Link: Read Paper
Labels: code generation, program synthesis

Benchmarking and Improving Text-to-SQL Generation under Ambiguity

Authors: Bhaskar, Adithya and Tomar, Tushar and Sathe, Ashutosh and Sarawagi, Sunita
Abstract: Research in Text-to-SQL conversion has been largely benchmarked against datasets where each text query corresponds to one correct SQL. However, natural language queries over real-life databases frequently involve significant ambiguity about the intended SQL due to overlapping schema names and multiple confusing relationship paths. To bridge this gap, we develop a novel benchmark called AmbiQT with over 3000 examples where each text is interpretable as two plausible SQLs due to lexical and/or str...
Link: Read Paper
Labels: code generation, program synthesis, benchmark

CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL

Authors: Kothyari, Mayank and Dhingra, Dhruva and Sarawagi, Sunita and Chakrabarti, Soumen
Abstract: Existing Text-to-SQL generators require the entire schema to be encoded with the user text. This is expensive or impractical for large databases with tens of thousands of columns. Standard dense retrieval techniques are inadequate for schema subsetting of a large structured database, where the correct semantics of retrieval demands that we rank sets of schema elements rather than individual documents. In response, we propose a two-stage process for effective coverage during retrieval. First, we ...
Link: Read Paper
Labels: code generation, program synthesis

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Authors: Zhou, Shuyan and Alon, Uri and Agarwal, Sumit and Neubig, Graham
Abstract: Since the rise of neural natural-language-to-code models (NL\rightarrowCode) that can generate long expressions and statements rather than a single next-token, one of the major problems has been reliably evaluating their generated output. In this paper, we propose CodeBERTScore: an evaluation metric for code generation, which builds on BERTScore (Zhang et al., 2020). Instead of encoding only the generated tokens as in BERTScore, CodeBERTScore also encodes the natural language input preceding the...
Link: Read Paper
Labels: code generation, code model, code model training, source code model

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Authors: Wang, Yue and Le, Hung and Gotmare, Akhilesh and Bui, Nghi and Li, Junnan and Hoi, Steven
Abstract: Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks, lacking the flexibility to operate in the optimal architecture for a specific task. Secondly, they often employ a limited set of pretraining objectives which might not be rel...
Link: Read Paper
Labels: general coding task, code model, code model training, source code model

Explanation selection using unlabeled data for chain-of-thought prompting

Authors: Ye, Xi and Durrett, Greg
Abstract: Recent work has shown how to prompt large language models with explanations to obtain strong performance on textual reasoning tasks, i.e., the chain-of-thought paradigm. However, subtly different explanations can yield widely varying downstream task accuracy. Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fash...
Link: Read Paper
Labels: agent design, prompt strategy, reason with code, empirical study

Exploring Distributional Shifts in Large Language Models for Code Analysis

Authors: Arakelyan, Shushan and Das, Rocktim and Mao, Yi and Ren, Xiang
Abstract: We systematically study how three large language models with code capabilities - CodeT5, Codex, and ChatGPT - generalize to out-of-domain data. We consider two fundamental applications - code summarization, and code generation. We split data into domains following its natural boundaries - by an organization, by a project, and by a module within the software project. We establish that samples from each new domain present all the models with a significant challenge of distribution shift. We study ...
Link: Read Paper
Labels: general coding task, code model, code model training, source code model, empirical study

Generating Data for Symbolic Language with Large Language Models

Authors: Ye, Jiacheng and Li, Chengzu and Kong, Lingpeng and Yu, Tao
Abstract: While large language models (LLMs) bring not only performance but also complexity, recent work has started to turn LLMs into data generators rather than task inferencers, where another affordable task model is trained for efficient deployment and inference. However, such an approach has primarily been applied to natural language tasks, and has not yet been explored for symbolic language tasks with complex structured outputs (e.g., semantic parsing and code generation). In this paper, we propose ...
Link: Read Paper
Labels: code model, code model training, source code model

Improving Transformer-based Program Repair Model through False Behavior Diagnosis

Authors: Kim, Youngkyoung and Kim, Misoo and Lee, Eunseok
Abstract: Research on automated program repairs using transformer-based models has recently gained considerable attention. The comprehension of the erroneous behavior of a model enables the identification of its inherent capacity and provides insights for improvement. However, the current landscape of research on program repair models lacks an investigation of their false behavior. Thus, we propose a methodology for diagnosing and treating the false behaviors of transformer-based program repair models. Sp...
Link: Read Paper
Labels: code generation, program repair

Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs

Authors: Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam
Abstract: A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency - poll the LLM multiple times and output the most frequent solution. Existing Self-Consistency techniques always generate a constant number of samples per question, where a better approach will be to non-uniformly distribute the available budget based on the amount of agreement in the samples generated so far. In response, we introduce Adaptive-Consistency, a cost-efficient, model-agn...
Link: Read Paper
Labels: agent design, prompt strategy, sampling and ranking

MiniChain: A Small Library for Coding with Large Language Models

Authors: Rush, Alexander
Abstract: Programming augmented by large language models (LLMs) opens up many new application areas, but also requires care. LLMs are accurate enough, on average, to replace core functionality, yet make basic mistakes that demonstrate a lack of robustness. An ecosystem of prompting tools, from intelligent agents to new programming languages, have emerged with different solutions for patching LLMs with other tools. In this work, we introduce MiniChain, an opinionated tool for LLM augmented programming, wit...
Link: Read Paper
Labels: general coding task

On Sample-Efficient Code Generation

Authors: Han, Hojae and Kim, Yu Jin and Kim, Byoungjip and Lee, Youngwon and Lee, Kyungjae and Lee, Kyungmin and Lee, Moontae and Bae, Kyunghoon and Hwang, Seung-won
Abstract: Large language models often struggle to predict runtime behavior in code generation tasks, leading to a reliance on rejection sampling (best-of-n) to generate multiple code snippets then select the best. Our distinction is reducing sampling costs, without compromising generation quality. We introduce EFFICODE, a novel framework that prioritizes sampling on test problems that models can solve. We show how EFFICODE estimates solvability to optimize computational costs during multiple sampling. Bas...
Link: Read Paper
Labels: code generation, program synthesis

Personalized Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Authors: Chen, Hailin and Saha, Amrita and Hoi, Steven and Joty, Shafiq
Abstract: With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in ...
Link: Read Paper
Labels: code generation, program synthesis, code model, code model training, source code model

Prompting with Pseudo-Code Instructions

Authors: Mishra, Mayank and Kumar, Prince and Bhat, Riyaz and Murthy, Rudra and Contractor, Danish and Tamilselvam, Srikanth
Abstract: Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models (LLM). Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, like pseudo-code. In this paper, we explore if prompting via pseudo-code instructions helps improve the performance of pre-trained language models. We manually create a dataset of pseudo-code p...
Link: Read Paper
Labels: agent design, prompt strategy, reason with code

Question Answering as Programming for Solving Time-Sensitive Questions

Authors: Zhu, Xinyu and Yang, Cheng and Chen, Bei and Li, Siheng and Lou, Jian-Guang and Yang, Yujiu
Abstract: Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. However, due to the dynamic and ever-changing nature of real-world facts, the answer can be completely different when the time constraint in the question changes. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering, while our experiments reveal that the aforementioned problems still pose a significant challenge to existing LLMs....
Link: Read Paper
Labels: agent design, prompt strategy, reason with code

Ranking llm-generated loop invariants for program verification

Authors: Chakraborty, Saikat and Lahiri, Shuvendu K and Fakhoury, Sarah and Musuvathi, Madanlal and Lal, Akash and Rastogi, Aseem and Senthilnathan, Aditya and Sharma, Rahul and Swamy, Nikhil
Abstract: Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results ...
Link: Read Paper
Labels: static analysis, program verification, agent design, prompt strategy, sampling and ranking

Symbolic Planning and Code Generation for Grounded Dialogue

Authors: Chiu, Justin and Zhao, Wenting and Chen, Derek and Vaduguru, Saujas and Rush, Alexander and Fried, Daniel
Abstract: Large language models (LLMs) excel at processing and generating text and code. However, LLMs have had limited applicability in grounded task-oriented dialogue as they are difficult to steer toward task objectives and fail to handle novel grounding. We present a modular and interpretable grounded dialogue system that addresses these shortcomings by composing LLMs with a symbolic planner and grounded code execution. Our system, consists of a reader and planner: the reader leverages an LLM to conve...
Link: Read Paper
Labels: code generation, program synthesis, agent design, planning

Towards Low-Resource Automatic Program Repair with Meta-Learning and Pretrained Language Models

Authors: Wang, Weishi and Wang, Yue and Hoi, Steven and Joty, Shafiq
Abstract: Automatic program repair (APR) has gained increasing attention as an essential technique in software development to reduce manual debugging efforts and boost developers’ productivity. Recent advances in deep learning (DL) based models have demonstrated promising results by learning from large-scale bug-fix examples in a data-driven manner. However, in practical scenarios, software bugs have an imbalanced distribution, and the fixing knowledge learned by APR models often only capture the patterns...
Link: Read Paper
Labels: code generation, program repair, code model, code model training, source code model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EMNLP2023

API-Assisted Code Generation for Question Answering on Varied Table Structures

Benchmarking and Improving Text-to-SQL Generation under Ambiguity

CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Explanation selection using unlabeled data for chain-of-thought prompting

Exploring Distributional Shifts in Large Language Models for Code Analysis

Generating Data for Symbolic Language with Large Language Models

Improving Transformer-based Program Repair Model through False Behavior Diagnosis

Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs

MiniChain: A Small Library for Coding with Large Language Models

On Sample-Efficient Code Generation

Personalized Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Prompting with Pseudo-Code Instructions

Question Answering as Programming for Solving Time-Sensitive Questions

Ranking llm-generated loop invariants for program verification

Symbolic Planning and Code Generation for Grounded Dialogue

Towards Low-Resource Automatic Program Repair with Meta-Learning and Pretrained Language Models

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

EMNLP2023