This project uses SMT-solving in deductive software verification as a case-study. Through two experiments, we present an approach to quantifying the difference between general-purpose and domain-specific benchmark suites for two SMT solvers - Z3 and CVC4.
For both experiments, we profiled the execution of the .smt2 file by the prover using Callgrind (part of the Valgrind suite of tools). This produces the .out file. This file is then annotated by Callgrind to produce a call-graph (the .txt file). To make our results machine- and platform-independent, we treated the SMT-solvers as virtual machines by filtering out system calls. The proportion of calls to each method was then normalised (adding up to 1) for the purposes of our distance measure.
The domain-specific reference suite consisits of 116 example programs taken from the Why3 distribution (ver. 0.86.1 - many of the programs can be found here). The profiled data for these programs is available here.
The general purpose suite is a random sample of 5800 programs from the SMT-LIB benchmark repository. The repository is structured in directories corresponding to the logics the are needed to be supported by the prover in order to solve the benchmarks within. For example, programs in the ALIA folder can only be solved by tools with decision procedures for Arrays, Linear Integer Arithmetic. More details and the graph used to partition the benchmarks in Experiment 2 can be found here. The profiled data for these programs is available here.