VERIFAI: Traceability and Verification of natural-language requirements
(2024-2026)
(2024-2026)
VERIFAI is led by Professor Rosemary Monahan and Dr. Diarmuid O'Donoghue in the Dept. of Computer Science and Hamilton Institute at Maynooth University. The project is funded via the ADAPT Research Centre , funding a postdoctoral researcher Dr. Arshad Beg from December 2024 until January 2026, who collaborated and worked with members of the Principles of Programming Research Group.
Our overall goal of the project is to provide support for the automatic generation of the formal specifications, and the traceability of the requirements from the initial software design stage through the systems implementation and verification. In this way we can ensure that when a system is implemented, we not only build the correct system, but we also build the system correctly. In addition, if an error occurs in the expected behaviour of the system, we can trace to where and when the source of the violation occurs. Approaches which will be explored as part of this project include Natural Language Processing, use of ontologies to describe the software systems domain, reuse of existing software artefacts from similar systems (e.g. through similarity –based), and large language models to identify and declare the specification as well as use of artificial intelligence to guide the process.
The research delivered following items (in descending order) during the course of the project:
- Short Version of VERIFAI2026 Paper -- Learning Infused Formal Reasoning: Contract Synthesis, Artefact Reuse and Semantic Foundations Accepted at ADAPT Annual Scientific Conference (AASC-2026), to be held on 14th of May, 2026, at DCU, Ireland.
Abstract: Artificial intelligence systems have achieved remarkable capability in natural language processing, perception and decision-making tasks. However, their behaviour often remains opaque and difficult to verify, limiting their applicability in safety-critical systems. Formal methods provide mathematically rigorous mechanisms for specifying and verifying system behaviour, yet the creation and maintenance of formal specifications remains labour intensive and difficult to scale. This paper outlines a research vision called Learning-Infused Formal Reasoning (LIFR), which integrates machine learning techniques with formal verification workflows. The framework focuses on three complementary research directions: automated contract synthesis from natural language requirements, semantic reuse of verification artifacts using graph matching and learning-based embeddings, and mathematically grounded semantic foundations based on the Unifying Theories of Programming (UTP) and the Theory of Institutions. Together these research threads aim to transform verification from isolated correctness proofs into a cumulative knowledge-driven process where specifications, contracts and proofs can be synthesised, aligned and reused across systems.
- Evaluating LLM-Generated ACSL Annotations for
Formal Verification Formal Techniques for Judicious Programming (FTfJP2026 at ECOOP) Conference. Status: Conditionally Accepted (1st of April, 2026).
Abstract: Formal specifications are crucial for building verifiable and dependable software systems, yet generating accurate and verifiable specifications for real-world C programs remains challenging. This paper presents an empirical evaluation of automated ACSL annotation generation strategies for C programs, comparing a rule-based Python script, Frama-C’s RTE plugin, and three large language models (DeepSeek-V3.2, GPT-5.2, and OLMo 3.1 32B Instruct). The study focuses on one-shot annotation generation, assessing how these approaches perform when directly applied to verification tasks. Using a filtered subset of the CASP benchmark, we evaluate generated annotations through Frama-C’s WP plugin with multiple SMT solvers, analyzing proof success rates, solver timeouts, and internal processing time. Our results show that rule-based approaches remain more reliable for verification success, while LLM-based methods exhibit more variable performance. These findings highlight both the current limitations and the potential of LLMs as complementary tools for automated specification generation. Code and data are available from GitHub Repo.
- Learning-Infused Formal Reasoning: From Contract Synthesis to Artifact Reuse and Formal Semantics VERIFAI-2026: The Interplay between Artificial Intelligence and Software Verification LASER center, Villebrumier, France, March 8-11, 2026.
It presents a long term vision of the research space.
Abstract: This vision paper articulates a long-term research agenda for formal methods at the intersection with artificial intelligence, outlining multiple conceptual and technical dimensions and reporting on our ongoing work toward realising this agenda. It advances a forward-looking perspective on the next generation of formal methods based on the integration of automated contract synthesis, semantic artifact reuse, and refinement-based theory. We argue that future verification systems must move beyond isolated correctness proofs toward a cumulative, knowledge-driven paradigm in which specifications, contracts, and proofs are continuously synthesised and transferred across systems. To support this shift, we outline a hybrid framework combining large language models with graph-based representations to enable scalable semantic matching and principled reuse of verification artifacts. Learning-based components provide semantic guidance across heterogeneous notations and abstraction levels, while symbolic matching ensures formal soundness. Grounded in compositional reasoning, this vision points toward verification ecosystems that evolve systematically, leveraging past verification efforts to accelerate future assurance.
- Traceable and verifiable software requirements: A synthesis of AI-enabled formal methods submitted to Journal of Software Testing, Verification and Reliability, Nov. 2025, under review, ver. v1.
Abstract: Drawing on an analysis of more than one hundred studies, it identifies state-of-the-art methods for translating natural language requirements into formal specifications across various programming languages and verification tools, including model checkers and theorem provers. Beyond technical methodologies, the work highlights key challenges in requirements traceability, tool integration, and specification reuse.
Complementing this survey, the paper reports an empirical evaluation of Frama-C plugins, including the Runtime Error (RTE) and Value Analysis (EVA) analysers, as well as the PathCrawler automatic test generation tool. These experiments assess the capability of formal analysis tools to infer assertions, generate ANSI/ISO C Specification Language (ACSL) annotations, detect runtime anomalies, and produce test cases for representative C programs. The study identifies practical barriers such as solver instability, path coverage constraints, and tool-specific restrictions, proposing strategies like dataset adaptation, selective path exploration, and local configuration to overcome them. The findings demonstrate that automated formalisation and verification workflows can enhance program correctness while emphasising the continued importance of human-guided intervention and coordinated multi-tool use.
Code and data are available from GitHub Repo. - Leveraging LLMs for Formal Software Requirements:
Challenges and Prospects Presented at OVERLAY at ECAI 2025 (28th European Conference on Artificial Intelligence). It reported:
(1) A focussed literature survey and analysis of forty (40)
research articles appeared in 2021 – 2025.
(2) Outcome of the experiments mentioned below
(3) Challenges and prospects of the field
Methodology of the initial experiments is to combine LLM with symbolic analysis tools in the Frama-C ecosystem. The workflow integrates path-based I/O examples and verification outputs to guide the generation of context-aware ACSL specifications. As a first step, we picked up 36 C programs from iFM2024 tutorial on Frama-C ecosystem and applied the above work pipeline to C programs.
Conclusions drawn from Verifiers Output: (1) Across the evaluated 36 C programs, 28 were evaluated successfully, in which, approximately 94% of all verification goals were successfully proven, demonstrating high reliability of the verification toolchain. (2) Failures were primarily due to timeout or partial prover coverage on complex arithmetic and array manipulation examples. Overall, the combined prover strategy offers robust verification performance with only marginal cases requiring further proof engineering or model refinement. Supporting documents are available from GitHub Repo.
- A Short Survey on Formalising Software Requirements using Large Language Models Submitted to Symposium on AI Verification (SAIV 2025).
Abstract: This paper presents a focused literature survey on the use of large language models (LLM) to assist in writing formal specifications for software. A summary of thirty-five key papers is presented, including examples for specifying programs written in Dafny, C and Java. Our methodology employed multiple academic databases to identify relevant research. The AI-assisted tool Elicit facilitated the initial paper selection, which were manually screened for final selection. The survey provides valuable insights and future directions for utilising LLMs while formalising software requirements.
- Formalising Software Requirements using Large Language Models A two page short paper, defining scope of the project, presented as a poster at ADAPT Annual Scientific Conference (AASC'25) at DCU, Ireland.
