Grammar-obeying program synthesis: A novel approach using large language models and many-objective genetic programming
Tao, Ning ; Ventresque, Anthony ; Nallur, Vivek ; Saber, Takfarinas
Tao, Ning
Ventresque, Anthony
Nallur, Vivek
Saber, Takfarinas
Loading...
Publication Date
2024-11-15
Type
journal article
Downloads
Citation
Tao, Ning, Ventresque, Anthony, Nallur, Vivek, & Saber, Takfarinas. (2025). Grammar-obeying program synthesis: A novel approach using large language models and many-objective genetic programming. Computer Standards & Interfaces, 92, 103938. https://doi.org/10.1016/j.csi.2024.103938
Abstract
Program synthesis is an important challenge that has attracted significant research interest, especially in recent years with advancements in Large Language Models (LLMs). Although LLMs have demonstrated success in program synthesis, there remains a lack of trust in the generated code due to documented risks (e.g., code with known and risky vulnerabilities). Therefore, it is important to restrict the search space and avoid bad programs. In this work, pre-defined restricted Backus–Naur Form (BNF) grammars are utilised, which are considered ‘safe’, and the focus is on identifying the most effective technique for grammar-obeying program synthesis, where the generated code must be correct and conform to the predefined grammar. It is shown that while LLMs perform well in generating correct programs, they often fail to produce code that adheres to the grammar. To address this, a novel Similarity-Based Many-Objective Grammar Guided Genetic Programming (SBMaOG3P) approach is proposed, leveraging the programs generated by LLMs in two ways: (i) as seeds following a grammar mapping process and (ii) as targets for similarity measure objectives. Experiments on a well-known and widely used program synthesis dataset indicate that the proposed approach successfully improves the rate of grammar-obeying program synthesis compared to various LLMs and the state-of-the-art Grammar-Guided Genetic Programming. Additionally, the proposed approach significantly improved the solution in terms of the best fitness value of each run for 21 out of 28 problems compared to G3P.
Funder
Science Foundation Ireland
Publisher
Elsevier
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International