An Exploration of Large Language Models in Malicious Source Code Detection

Authors: Xue, Di and Zhao, Gang and Fan, Zhongqi and Li, Wei and Xu, Yahong and Liu, Zhen and Liu, Yin and Yuan, Zhongliang

Abstract:

Embedding malicious code within the software supply chain has become a significant concern in the information technology field. Current methods for detecting malicious code, based on signatures, behavior analysis, and traditional machine learning models, lack result interpretability. This study proposes a novel malicious code detection framework, Mal-LLM, which leverages the cost advantages of traditional machine learning models and the interpretability of LLMs. Initially, traditional machine learning models filter vast amounts of malicious source code in the software supply chain. Subsequently, LLMs analyze and interpret the filtered malicious source code using a customized prompt template incorporating role-playing and chain-of-thought techniques. The feasibility of the Mal-LLM framework is validated through extensive experimental analyses, examining the ambiguity and redundancy of the LLM in the framework, the significance of ''experience'' and ''malicious'' prompts, and exploring methods to reduce the cost of using LLMs from an enterprise perspective.

Link: Read Paper

Labels: static analysis, bug detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An Exploration of Large Language Models in Malicious Source Code Detection

FilesExpand file tree

paper_5.md

Latest commit

History

paper_5.md

File metadata and controls

An Exploration of Large Language Models in Malicious Source Code Detection