From 42f9f36dcf4fc59416878dbc39e8c84f7b895cda Mon Sep 17 00:00:00 2001 From: Peter Schrammel Date: Thu, 22 Mar 2018 19:18:07 +0000 Subject: [PATCH 1/6] Improve docs content layout and front page --- doc/architectural/cbmc-guide.md | 601 ------------------ .../cprover-architecture-overview.md | 262 ++++++++ doc/architectural/front-page.md | 94 ++- doc/architectural/howto.md | 17 +- src/ansi-c/module.md | 4 +- src/doxyfile | 128 ++-- src/goto-programs/module.md | 4 +- src/goto-symex/module.md | 4 +- src/solvers/refinement/README.md | 8 +- src/util/command-line-parsing.md | 6 +- 10 files changed, 418 insertions(+), 710 deletions(-) delete mode 100644 doc/architectural/cbmc-guide.md create mode 100644 doc/architectural/cprover-architecture-overview.md diff --git a/doc/architectural/cbmc-guide.md b/doc/architectural/cbmc-guide.md deleted file mode 100644 index bd96f4dbba4..00000000000 --- a/doc/architectural/cbmc-guide.md +++ /dev/null @@ -1,601 +0,0 @@ -\ingroup module_hidden -\page cbmc-guide CBMC Guide - -\author Martin Brain - -Background Information -====================== - -First off; read the \ref cprover-manual "CProver Manual". It describes -how to get, build and use CBMC and SATABS. This document covers the -internals of the system and how to get started on development. - -Documentation -------------- - -Apart from the (user-orientated) CPROVER manual and this document, most -of the rest of the documentation is inline in the code as `doxygen` and -some comments. A man page for CBMC, goto-cc and goto-instrument is -contained in the `doc/` directory and gives some options for these -tools. All of these could be improved and patches are very welcome. In -some cases the algorithms used are described in the relevant papers. - -Architecture ------------- - -CPROVER is structured in a similar fashion to a compiler. It has -language specific front-ends which perform limited syntactic analysis -and then convert to an intermediate format. The intermediate format can -be output to files (this is what `goto-cc` does) and are (informally) -referred to as “goto binaries” or “goto programs”. The back-end are -tools process this format, either directly from the front-end or from -it’s saved output. These include a wide range of analysis and -transformation tools (see Section \[section:other-apps\]). - -Coding Standards ----------------- - -CPROVER is written in a fairly minimalist subset of C++; templates and -meta-programming are avoided except where necessary. The standard -library is used but in many cases there are alternatives provided in -`util/` (see Section \[section:util\]) which are preferred. Boost is -not used. - -Patches should be formatted so that code is indented with two space -characters, not tab and wrapped to 75 or 72 columns. Headers for doxygen -should be given (and preferably filled!) and the author will be the -person who first created the file. - -Identifiers should be lower case with underscores to separate words. -Types (classes, structures and typedefs) names must[^1] end with a `t`. -Types that model types (i.e. C types in the program that is being -interpreted) are named with `_typet`. For example `ui_message_handlert` -rather than `UI_message_handlert` or `UIMessageHandler` and -`union_typet`. - -How to Contribute ------------------ - -Fixes, changes and enhancements to the CPROVER code base should be -developed against the `trunk` version and submitted to Daniel as patches -produced by `diff -Naur` or `svn diff`. Entire applications are best -developed independently (`git svn` is a popular choice for tracking the -main trunk but also having local development) until it is clear what -their utility, future and maintenance is likely to be. - -Other Useful Code {#section:other-apps} ------------------ - -The CPROVER subversion archive contains a number of separate programs. -Others are developed separately as patches or separate -branches.Interfaces are have been and are continuing to stablise but -older code may require work to compile and function correctly. - -In the main archive: - -* `CBMC`: A bounded model checking tool for C and C++. See Section - \[section:CBMC\]. - -* `goto-cc`: A drop-in, flag compatible replacement for GCC and other - compilers that produces goto-programs rather than executable binaries. - See Section \[section:goto-cc\]. - -* `goto-instrument`: A collection of functions for instrumenting and - modifying goto-programs. See Section \[section:goto-instrument\]. - -Model checkers and similar tools: - -* `SatABS`: A CEGAR model checker using predicate abstraction. Is - roughly 10,000 lines of code (on top of the CPROVER code base) and is - developed in its own subversion archive. It uses an external model - checker to find potentially feasible paths. Key limitations are - related to code with pointers and there is scope for significant - improvement. - -* `Scratch`: Alistair Donaldson’s k-induction based tool. The - front-end is in the old project CVS and some of the functionality is - in `goto-instrument`. - -* `Wolverine`: An implementation of Ken McMillan’s IMPACT algorithm - for sequential programs. In the old project CVS. - -* `C-Impact`: An implementation of Ken McMillan’s IMPACT algorithm for - parallel programs. In the old project CVS. - -* `LoopFrog`: A loop summarisation tool. - -* `???`: Christoph’s termination analyser. - -Test case generation: - -* `cover`: A basic test-input generation tool. In the old - project CVS. - -* `FShell`: A test-input generation tool that allows the user to - specify the desired coverage using a custom language (which includes - regular expressions over paths). It uses incremental SAT and is thus - faster than the naïve “add assertions one at a time and use the - counter-examples” approach. Is developed in its own subversion. - -Alternative front-ends and input translators: - -* `Scoot`: A System-C to C translator. Probably in the old - project CVS. - -* `???`: A Simulink to C translator. In the old project CVS. - -* `???`: A Verilog front-end. In the old project CVS. - -* `???`: A converter from Codewarrior project files to Makefiles. In - the old project CVS. - -Other tools: - -* `ai`: Leo’s hybrid abstract interpretation / CEGAR tool. - -* `DeltaCheck?`: Ajitha’s slicing tool, aimed at locating changes and - differential verification. In the old project CVS. - -There are tools based on the CPROVER framework from other research -groups which are not listed here. - -Source Walkthrough -================== - -This section walks through the code bases in a rough order of interest / -comprehensibility to the new developer. - -`doc` ------ - -At the moment just contains the CBMC man page. - -`regression/` -------------- - -The regression tests are currently being moved from CVS. The -`regression/` directory contains all of those that have -been moved. They are grouped into directories for each of the tools. -Each of these contains a directory per test case, containing a C or C++ -file that triggers the bug and a `.dsc` file that describes -the tests, expected output and so on. There is a Perl script, -`test.pl` that is used to invoke the tests as: - - ../test.pl -c PATH_TO_CBMC - -The `–help` option gives instructions for use and the -format of the description files. - -`src/` ------- - -The source code is divided into a number of sub-directories, each -containing the code for a different part of the system. In the top level -files there are only a few files: - -* `config.inc`: The user-editable configuration parameters for the - build process. The main use of this file is setting the paths for the - various external SAT solvers that are used. As such, anyone building - from source will likely need to edit this. - -* `Makefile`: The main systems Make file. Parallel builds are - supported and encouraged; please don’t break them! - -* `common`: System specific magic required to get the system to build. - This should only need to be edited if porting CBMC to a new platform / - build environment. - -* `doxygen.cfg`: The config file for doxygen.cfg - -### `util/` {#section:util} - -`util/` contains the low-level data structures and -manipulation functions that are used through-out the CPROVER code-base. -For almost any low-level task, the code required is probably in -`util/`. Key files include: - -* `irep.h`: This contains the definition of `irept`, the basis of many - of the data structures in the project. They should not be used - directly; one of the derived classes should be used. For more - information see Section \[section:irept\]. - -* `expr.h`: The parent class for all of the expressions. Provides a - number of generic functions, `exprt` can be used with these but when - creating data, subclasses of `exprt` should be used. - -* `std_expr.h`: Provides subclasses of `exprt` for common kinds of - expression for example `plus_exprt`, `minus_exprt`, - `dereference_exprt`. These are the intended interface for creating - expressions. - -* `std_types.h`: Provides subclasses of `typet` (a subclass of - `irept`) to model C and C++ types. This is one of the preferred - interfaces to `irept`. The front-ends handle type promotion and most - coercision so the type system and checking goto-programs is simpler - than C. - -* `dstring.h`: The CPROVER string class. This enables sharing between - strings which significantly reduces the amount of memory required and - speeds comparison. `dstring` should not be used directly, `irep_idt` - should be used instead, which (dependent on build options) is an alias - for `dstring`. - -* `mp_arith.h`: The wrapper class for multi-precision arithmetic - within CPROVER. Also see `arith_tools.h`. - -* `ieee_float.h`: The arbitrary precision float model used within - CPROVER. Based on `mp_integer`s. - -* `context.h`: A generic container for symbol table like constructs - such as namespaces. Lookup gives type, location of declaration, name, - ‘pretty name’, whether it is static or not. - -* `namespace.h`: The preferred interface for the context class. The - key function is `lookup` which converts a string (`irep_idt`) to a - symbol which gives the scope of declaration, type and so on. This - works for functions as well as variables. - -### `langapi/` - -This contains the basic interfaces and support classes for programming -language front ends. Developers only really need look at this if they -are adding support for a new language. It’s main users are the two (in -trunk) language front-ends; `ansi-c/` and -`cpp/`. - -### `ansi-c/` - -Contains the front-end for ANSI C, plus a variety of common extensions. -This parses the file, performs some basic sanity checks (this is one -area in which the UI could be improved; patches most welcome) and then -produces a goto-program (see below). The parser is a traditional Flex / -Bison system. - -`internal_addition.c` contains the implementation of various ‘magic’ -functions that are that allow control of the analysis from the source -code level. These include assertions, assumptions, atomic blocks, memory -fences and rounding modes. - -The `library/` subdirectory contains versions of some of the C standard -header files that make use of the CPROVER built-in functions. This -allows CPROVER programs to be ‘aware’ of the functionality and model it -correctly. Examples include `stdio.c`, `string.c`, `setjmp.c` and -various threading interfaces. - -### `cpp/` - -This directory contains the C++ front-end. It supports the subset of C++ -commonly found in embedded and system applications. Consequentially it -doesn’t have full support for templates and many of the more advanced -and obscure C++ features. The subset of the language that can be handled -is being extended over time so bug reports of programs that cannot be -parsed are useful. - -The functionality is very similar to the ANSI C front end; parsing the -code and converting to goto-programs. It makes use of code from -`langapi` and `ansi-c`. - -### `goto-programs/` - -Goto programs are the intermediate representation of the CPROVER tool -chain. They are language independent and similar to many of the compiler -intermediate languages. Section \[section:goto-programs\] describes the -`goto_programt` and `goto_functionst` data structures in detail. However -it useful to understand some of the basic concepts. Each function is a -list of instructions, each of which has a type (one of 18 kinds of -instruction), a code expression, a guard expression and potentially some -targets for the next instruction. They are not natively in static -single-assign (SSA) form. Transitions are nondeterministic (although in -practise the guards on the transitions normally cover form a disjoint -cover of all possibilities). Local variables have non-deterministic -values if they are not initialised. Variables and data within the -program is commonly one of three types (parameterised by width): -`unsignedbv_typet`, `signedbv_typet` and `floatbv_typet`, see -`util/std_types.h` for more information. Goto programs can be serialised -in a binary (wrapped in ELF headers) format or in XML (see the various -`_serialization` files). - -The `cbmc` option `–show-goto-programs` is often a good starting point -as it outputs goto-programs in a human readable form. However there are -a few things to be aware of. Functions have an internal name (for -example `c::f00`) and a ‘pretty name’ (for example `f00`) and which is -used depends on whether it is internal or being presented to the user. -The `main` method is the ‘logical’ main which is not necessarily the -main method from the code. In the output `NONDET` is use to represent a -nondeterministic assignment to a variable. Likewise `IF` as a beautified -`GOTO` instruction where the guard expression is used as the condition. -`RETURN` instructions may be dropped if they precede an `END_FUNCTION` -instruction. The comment lines are generated from the `locationt` field -of the `instructiont` structure. - -`goto-programs/` is one of the few places in the CPROVER codebase that -templates are used. The intention is to allow the general architecture -of program and functions to be used for other formalisms. At the moment -most of the templates have a single instantiation; for example -`goto_functionst` and `goto_function_templatet` and `goto_programt` and -`goto_program_templatet`. - -### `goto-symex/` - -This directory contains a symbolic evaluation system for goto-programs. -This takes a goto-program and translates it to an equation system by -traversing the program, branching and merging and unwinding loops as -needed. Each reverse goto has a separate counter (the actual counting is -handled by `cbmc`, see the `–unwind` and `–unwind-set` options). When a -counter limit is reach, an assertion can be added to explicitly show -when analysis is incomplete. The symbolic execution includes constant -folding so loops that have a constant number of iterations will be -handled completely (assuming the unwinding limit is sufficient). - -The output of the symbolic execution is a system of equations; an object -containing a list of `symex_target_elements`, each of which are -equalities between `expr` expressions. See `symex_target_equation.h`. -The output is in static, single assignment (SSA) form, which is *not* -the case for goto-programs. - -### `pointer-analysis/` - -To perform symbolic execution on programs with dereferencing of -arbitrary pointers, some alias analysis is needed. `pointer-analysis` -contains the three levels of analysis; flow and context insensitive, -context sensitive and flow and context sensitive. The code needed is -subtle and sophisticated and thus there may be bugs. - -### `solvers/` - -The `solvers/` directory contains interfaces to a number of -different decision procedures, roughly one per directory. - -* prop/: The basic and common functionality. The key file is - `prop_conv.h` which defines `prop_convt`. This is the base class that - is used to interface to the decision procedures. The key functions are - `convert` which takes an `exprt` and converts it to the appropriate, - solver specific, data structures and `dec_solve` (inherited from - `decision_proceduret`) which invokes the actual decision procedures. - Individual decision procedures (named `*_dect`) objects can be created - but `prop_convt` is the preferred interface for code that uses them. - -* flattening/: A library that converts operations to bit-vectors, - including calling the conversions in `floatbv` as necessary. Is - implemented as a simple conversion (with caching) and then a - post-processing function that adds extra constraints. This is not used - by the SMT or CVC back-ends. - -* dplib/: Provides the `dplib_dect` object which used the decision - procedure library from “Decision Procedures : An Algorithmic Point of - View”. - -* cvc/: Provides the `cvc_dect` type which interfaces to the old (pre - SMTLib) input format for the CVC family of solvers. This format is - still supported by depreciated in favour of SMTLib 2. - -* smt1/: Provides the `smt1_dect` type which converts the formulae to - SMTLib version 1 and then invokes one of Boolector, CVC3, OpenSMT, - Yices, MathSAT or Z3. Again, note that this format is depreciated. - -* smt2/: Provides the `smt2_dect` type which functions in a similar - way to `smt1_dect`, calling Boolector, CVC3, MathSAT, Yices or Z3. - Note that the interaction with the solver is batched and uses - temporary files rather than using the interactive command supported by - SMTLib 2. With the `–fpa` option, this output mode will not flatten - the floating point arithmetic and instead output the proposed SMTLib - floating point standard. - -* qbf/: Back-ends for a variety of QBF solvers. Appears to be no - longer used or maintained. - -* sat/: Back-ends for a variety of SAT solvers and DIMACS output. - -### `cbmc/` {#section:CBMC} - -This contains the first full application. CBMC is a bounded model -checker that uses the front ends (`ansi-c`, `cpp`, goto-program or -others) to create a goto-program, `goto-symex` to unwind the loops the -given number of times and to produce and equation system and finally -`solvers` to find a counter-example (technically, `goto-symex` is then -used to construct the counter-example trace). - -### `goto-cc/` {#section:goto-cc} - -`goto-cc` is a compiler replacement that just performs the first step of -the process; converting C or C++ programs to goto-binaries. It is -intended to be dropped in to an existing build procedure in place of the -compiler, thus it emulates flags that would affect the semantics of the -code produced. Which set of flags are emulated depends on the naming of -the `goto-cc/` binary. If it is called `goto-cc` then it emulates GCC -flags, `goto-armcc` emulates the ARM compiler, `goto-cl` emulates VCC -and `goto-cw` emulates the Code Warrior compiler. The output of this -tool can then be used with `cbmc` or `goto-instrument`. - -### `goto-instrument/` {#section:goto-instrument} - -The `goto-instrument/` directory contains a number of tools, one per -file, that are built into the `goto-instrument` program. All of them -take in a goto-program (produced by `goto-cc`) and either modify it or -perform some analysis. Examples include `nondet_static.cpp` which -initialises static variables to a non-deterministic value, -`nondet_volatile.cpp` which assigns a non-deterministic value to any -volatile variable before it is read and `weak_memory.h` which performs -the necessary transformations to reason about weak memory models. The -exception to the “one file for each piece of functionality” rule are the -program instrumentation options (mostly those given as “Safety checks” -in the `goto-instrument` help text) which are included in the -`goto-program/` directory. An example of this is -`goto-program/stack_depth.h` and the general rule seems to be that -transformations and instrumentation that `cbmc` uses should be in -`goto-program/`, others should be in `goto-instrument`. - -`goto-instrument` is a very good template for new analysis tools. New -developers are advised to copy the directory, remove all files apart -from `main.*`, `parseoptions.*` and the `Makefile` and use these as the -skeleton of their application. The `doit()` method in `parseoptions.cpp` -is the preferred location for the top level control for the program. - -### `linking/` - -Probably the code to emulate a linker. This allows multiple ‘object -files’ (goto-programs) to be linked into one ‘executable’ (another -goto-program), thus allowing existing build systems to be used to build -complete goto-program binaries. - -### `big-int/` - -CPROVER is distributed with its own multi-precision arithmetic library; -mainly for historical and portability reasons. The library is externally -developed and thus `big-int` contains the source as it is distributed. -This should not be used directly, see `util/mp_arith.h` for the CPROVER -interface. - -### `xmllang/` - -CPROVER has optional XML output for results and there is an XML format -for goto-programs. It is used to interface to various IDEs. The -`xmllang/` directory contains the parser and helper functions for -handling this format. - -### `floatbv/` - -This library contains the code that is used to convert floating point -variables (`floatbv`) to bit vectors (`bv`). This is referred to as -‘bit-blasting’ and is called in the `solver` code during conversion to -SAT or SMT. It also contains the abstraction code described in the -FMCAD09 paper. - -Data Structures -=============== - -This section discusses some of the key data-structures used in the -CPROVER codebase. - -`irept` {#section:irept} ------------------------- - -There are a large number of kind of tree structured or tree-like data in -CPROVER. `irept` provides a single, unified representation for all of -these, allowing structure sharing and reference counting of data. As -such `irept` is the basic unit of data in CPROVER. Each `irept` -contains[^2] a basic unit of data (of type `dt`) which contains four -things: - -* `data`: A string[^3], which is returned when the `id()` function is - used. - -* `named_sub`: A map from `irep_namet` (a string) to an `irept`. This - is used for named children, i.e. subexpressions, parameters, etc. - -* `comments`: Another map from `irep_namet` to `irept` which is used - for annotations and other ‘non-semantic’ information - -* `sub`: A vector of `irept` which is used to store ordered but - unnamed children. - -The `irept::pretty` function outputs the contents of an `irept` directly -and can be used to understand an debug problems with `irept`s. - -On their own `irept`s do not “mean” anything; they are effectively -generic tree nodes. Their interpretation depends on the contents of -result of the `id` function (the `data`) field. `util/irep_ids.txt` -contains the complete list of `id` values. During the build process it -is used to generate `util/irep_ids.h` which gives constants for each id -(named `ID_`). These can then be used to identify what kind of data -`irept` stores and thus what can be done with it. - -To simplify this process, there are a variety of classes that inherit -from `irept`, roughly corresponding to the ids listed (i.e. `ID_or` -(the string `"or”`) corresponds to the class `or_exprt`). These give -semantically relevant accessor functions for the data; effectively -different APIs for the same underlying data structure. None of these -classes add fields (only methods) and so static casting can be used. The -inheritance graph of the subclasses of `irept` is a useful starting -point for working out how to manipulate data. - -There are three main groups of classes (or APIs); those derived from -`typet`, `codet` and `exprt` respectively. Although all of these inherit -from `irept`, these are the most abstract level that code should handle -data. If code is manipulating plain `irept`s then something is wrong -with the architecture of the code. - -Many of the key descendent of `exprt` are declared in `std_expr.h`. All -expressions have a named subfield / annotation which gives the type of -the expression (slightly simplified from C/C++ as `unsignedbv_typet`, -`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are -explicit with an expression with `id() == ID_typecast` and an ‘interface -class’ named `typecast_exprt`. One key descendent of `exprt` is -`symbol_exprt` which creates `irept` instances with the id of “symbol”. -These are used to represent variables; the name of which can be found -using the `get_identifier` accessor function. - -`codet` inherits from `exprt` and is defined in `std_code.h`. They -represent executable code; statements in C rather than expressions. In -the front-end there are versions of these that hold whole code blocks, -but in goto-programs these have been flattened so that each `irept` -represents one sequence point (almost one line of code / one -semi-colon). The most common descendents of `codet` are `code_assignt` -so a common pattern is to cast the `codet` to an assignment and then -recurse on the expression on either side. - -`goto-programs` {#section:goto-programs} ----------------------------------------- - -The common starting point for working with goto-programs is the -`read_goto_binary` function which populates an object of -`goto_functionst` type. This is defined in `goto_functions.h` and is an -instantiation of the template `goto_functions_templatet` which is -contained in `goto_functions_template.h`. They are wrappers around a map -from strings to `goto_programt`’s and iteration macros are provided. -Note that `goto_function_templatet` (no `s`) is defined in the same -header as `goto_functions_templatet` and is gives the C type for the -function and Boolean which indicates whether the body is available -(before linking this might not always be true). Also note the slightly -counter-intuitive naming; `goto_functionst` instances are the top level -structure representing the program and contain `goto_programt` instances -which represent the individual functions. At the time of writing -`goto_functionst` is the only instantiation of the template -`goto_functions_templatet` but other could be produced if a different -data-structures / kinds of models were needed for functions. - -`goto_programt` is also an instantiation of a template. In a similar -fashion it is `goto_program_templatet` and allows the types of the guard -and expression used in instructions to be parameterised. Again, this is -currently the only use of the template. As such there are only really -helper functions in `goto_program.h` and thus `goto_program_template.h` -is probably the key file that describes the representation of (C) -functions in the goto-program format. It is reasonably stable and -reasonably documented and thus is a good place to start looking at the -code. - -An instance of `goto_program_templatet` is effectively a list of -instructions (and inner template called `instructiont`). It is important -to use the copy and insertion functions that are provided as iterators -are used to link instructions to their predecessors and targets and -careless manipulation of the list could break these. Likewise there are -helper macros for iterating over the instructions in an instance of -`goto_program_templatet` and the use of these is good style and strongly -encouraged. - -Individual instructions are instances of type `instructiont`. They -represent one step in the function. Each has a type, an instance of -`goto_program_instruction_typet` which denotes what kind of instruction -it is. They can be computational (such as `ASSIGN` or `FUNCTION_CALL`), -logical (such as `ASSUME` and `ASSERT`) or informational (such as -`LOCATION` and `DEAD`). At the time of writing there are 18 possible -values for `goto_program_instruction_typet` / kinds of instruction. -Instructions also have a guard field (the condition under which it is -executed) and a code field (what the instruction does). These may be -empty depending on the kind of instruction. In the default -instantiations these are of type `exprt` and `codet` respectively and -thus covered by the previous discussion of `irept` and its descendents. -The next instructions (remembering that transitions are guarded by -non-deterministic) are given by the list `targets` (with the -corresponding list of labels `labels`) and the corresponding set of -previous instructions is get by `incoming_edges`. Finally `instructiont` -have informational `function` and `location` fields that indicate where -they are in the code. - -[^1]: There are a couple of exceptions, including the graph classes - -[^2]: Or references, if reference counted data sharing is enabled. It is - enabled by default; see the `SHARING` macro. - -[^3]: Unless `USE_STD_STRING` is set, this is actually -a `dstring` and thus an integer which is a reference into a string table diff --git a/doc/architectural/cprover-architecture-overview.md b/doc/architectural/cprover-architecture-overview.md new file mode 100644 index 00000000000..457cba51e16 --- /dev/null +++ b/doc/architectural/cprover-architecture-overview.md @@ -0,0 +1,262 @@ +\ingroup module_hidden +\page cprover-architecture-overview CProver Architecture Overview + +\author Martin Brain, Peter Schrammel + +# Overview of CPROVER Directories + +## `src/` + +The source code is divided into a number of sub-directories, each +containing the code for a different part of the system. + +- GOTO-Programs + + * \ref goto-programs + * \ref linking + +- Symbolic Execution + * \ref goto-symex + +- Static Analyses + + * \ref analyses + * \ref pointer-analysis + +- Solvers + * \ref solvers + +- Language Front Ends + + * Language API: \ref langapi + * C: \ref ansi-c + * C++: \ref cpp + * Java: \ref java_bytecode + * JavaScript: \ref jsil + +- Tools + + * \ref cbmc + * \ref clobber + * \ref goto-analyzer + * \ref goto-instrument + * \ref goto-diff + * \ref memory-models + * \ref goto-cc + * \ref jbmc + +- Utilities + + * \ref big-int + * \ref json + * \ref xmllang + * \ref util + * \ref miniz + * \ref nonstd + +In the top level of `src` there are only a few files: + +* `config.inc`: The user-editable configuration parameters for the + build process. The main use of this file is setting the paths for the + various external SAT solvers that are used. As such, anyone building + from source will likely need to edit this. + +* `Makefile`: The main systems Make file. Parallel builds are + supported and encouraged; please don’t break them! + +* `common`: System specific magic required to get the system to build. + This should only need to be edited if porting CBMC to a new platform / + build environment. + +* `doxygen.cfg`: The config file for doxygen.cfg + +## `doc/` + +Contains the CBMC man page. Doxygen HTML pages are generated +into the `doc/html` directory when running `doxygen` from `src`. + +## `regression/` + +The `regression/` directory contains the test suites. +They are grouped into directories for each of the tools/modules. +Each of these contains a directory per test case, containing a C or C++ +file that triggers the bug and a `.desc` file that describes +the tests, expected output and so on. There is a Perl script, +`test.pl` that is used to invoke the tests as: + + ../test.pl -c PATH_TO_CBMC + +The `–help` option gives instructions for use and the +format of the description files. + + +# General Information + +First off, read the \ref cbmc-user-manual "CBMC User Manual". It describes +how to get, build and use CBMC. This document covers the +internals of the system and how to get started on development. + +## Documentation + +Apart from the (user-orientated) CBMC user manual and this document, most +of the rest of the documentation is inline in the code as `doxygen` and +some comments. A man page for CBMC, goto-cc and goto-instrument is +contained in the `doc/` directory and gives some options for these +tools. All of these could be improved and patches are very welcome. In +some cases the algorithms used are described in the relevant papers. + +## Architecture + +This section provides a graphical overview of how CBMC fits together. +CBMC takes C code or a goto-binary as input and tries to emit traces +of executions that lead to crashes or undefined behaviour. The diagram +below shows the intermediate steps in this process. + +\dot +digraph G { + + rankdir="TB"; + node [shape=box, fontcolor=blue]; + + subgraph top { + rank=same; + 1 -> 2 -> 3 -> 4; + } + + subgraph bottom { + rank=same; + 5 -> 6 -> 7 -> 8 -> 9; + } + + /* shift bottom subgraph over */ + 9 -> 1 [color=white]; + + 4 -> 5; + + 1 [label="command line\nparsing" URL="\ref cbmc_parse_optionst"]; + 2 [label="preprocessing,\nparsing" URL="\ref preprocessing"]; + 3 [label="language\ntype-checking" URL="\ref type-checking"]; + 4 [label="goto\nconversion" URL="\ref goto-conversion"]; + 5 [label="instrumentation" URL="\ref instrumentation"]; + 6 [label="symbolic\nexecution" URL="\ref symbolic-execution"]; + 7 [label="SAT/SMT\nencoding" URL="\ref sat-smt-encoding"]; + 8 [label="decision\nprocedure" URL="\ref decision-procedure"]; + 9 [label="counter example\nproduction" URL="\ref counter-example-production"]; +} +\enddot + +The \ref cbmc-user-manual "CBMC User Manual" describes CBMC from a user +perspective. Each node in the diagram above links to the appropriate +class or module documentation, describing that particular stage in the +CBMC pipeline. +CPROVER is structured in a similar fashion to a compiler. It has +language specific front-ends which perform limited syntactic analysis +and then convert to an intermediate format. The intermediate format can +be output to files (this is what `goto-cc` does) and are (informally) +referred to as “goto binaries” or “goto programs”. The back-end are +tools process this format, either directly from the front-end or from +it’s saved output. These include a wide range of analysis and +transformation tools (see \ref other-tools). + +## Coding Standards + +See `CODING_STANDARD.md` file in the root of the CBMC repository. + +CPROVER is written in a fairly minimalist subset of C++; templates and +meta-programming are avoided except where necessary. +External library dependencies are avoided (only STL and a SAT solver +are required). Boost is not used. The `util` directory contains many +utilities that are not (yet) in the standard library. + +Patches should be formatted so that code is indented with two space +characters, not tab and wrapped to 80 columns. Headers for doxygen +should be given (and preferably filled!) and the author will be the +person who first created the file. Add doxygen comments to +undocumented functions as you touch them. Coding standards +and doxygen comments are enforced by CI before a patch can be +merged by running `clang-format` and `cpplint`. + +Identifiers should be lower case with underscores to separate words. +Types (classes, structures and typedefs) names must end with a `t`. +Types that model types (i.e. C types in the program that is being +interpreted) are named with `_typet`. For example `ui_message_handlert` +rather than `UI_message_handlert` or `UIMessageHandler` and +`union_typet`. + + +\section other-tools Other Tools + +FIXME: The text in this section is a bit outdated. + +The CPROVER subversion archive contains a number of separate programs. +Others are developed separately as patches or separate +branches.Interfaces are have been and are continuing to stablise but +older code may require work to compile and function correctly. + +In the main archive: + +* `CBMC`: A bounded model checking tool for C and C++. See + \ref cbmc. + +* `goto-cc`: A drop-in, flag compatible replacement for GCC and other + compilers that produces goto-programs rather than executable binaries. + See \ref goto-cc. + +* `goto-instrument`: A collection of functions for instrumenting and + modifying goto-programs. See \ref goto-instrument. + +Model checkers and similar tools: + +* `SatABS`: A CEGAR model checker using predicate abstraction. Is + roughly 10,000 lines of code (on top of the CPROVER code base) and is + developed in its own subversion archive. It uses an external model + checker to find potentially feasible paths. Key limitations are + related to code with pointers and there is scope for significant + improvement. + +* `Scratch`: Alistair Donaldson’s k-induction based tool. The + front-end is in the old project CVS and some of the functionality is + in `goto-instrument`. + +* `Wolverine`: An implementation of Ken McMillan’s IMPACT algorithm + for sequential programs. In the old project CVS. + +* `C-Impact`: An implementation of Ken McMillan’s IMPACT algorithm for + parallel programs. In the old project CVS. + +* `LoopFrog`: A loop summarisation tool. + +* `TAN`: Christoph’s termination analyser. + +Test case generation: + +* `cover`: A basic test-input generation tool. In the old + project CVS. + +* `FShell`: A test-input generation tool that allows the user to + specify the desired coverage using a custom language (which includes + regular expressions over paths). It uses incremental SAT and is thus + faster than the naïve “add assertions one at a time and use the + counter-examples” approach. Is developed in its own subversion. + +Alternative front-ends and input translators: + +* `Scoot`: A System-C to C translator. Probably in the old + project CVS. + +* `???`: A Simulink to C translator. In the old project CVS. + +* `???`: A Verilog front-end. In the old project CVS. + +* `???`: A converter from Codewarrior project files to Makefiles. In + the old project CVS. + +Other tools: + +* `ai`: Leo’s hybrid abstract interpretation / CEGAR tool. + +* `DeltaCheck?`: Ajitha’s slicing tool, aimed at locating changes and + differential verification. In the old project CVS. + +There are tools based on the CPROVER framework from other research +groups which are not listed here. diff --git a/doc/architectural/front-page.md b/doc/architectural/front-page.md index 13e832074a8..a2a5046e01e 100644 --- a/doc/architectural/front-page.md +++ b/doc/architectural/front-page.md @@ -1,46 +1,90 @@ -CProver Documentation +CProver Developer Documentation ===================== -\author Kareem Khazem - These pages contain user tutorials, automatically-generated API documentation, and higher-level architectural overviews for the -CProver codebase. Users can download CProver tools from the -CProver website; contributors -should use the repository -hosted on GitHub. +CProver codebase. CProver is a platform for software verification. Users can +download CProver tools from the CProver +website; contributors should use the +repository hosted on GitHub. CBMC +is part of CProver. + +CBMC is a Bounded Model Checker for C and C++ programs. It supports C89, C99, +most of C11 and most compiler extensions provided by gcc and Visual Studio. It +also supports SystemC using Scoot. It allows verifying array bounds (buffer +overflows), pointer safety, arithmetic exceptions and user-specified assertions. +Furthermore, it can check C and C++ for consistency with other languages, such +as Verilog. The verification is performed by unwinding the loops in the program +and passing the resulting equation to a decision procedure. + +For further information see [cprover.org](http://www.cprover.org/cbmc). + +Versions +======== + +Get the [latest release](https://github.com/diffblue/cbmc/releases) +* Releases are tested and for production use. + +Get the current *develop* version: `git clone https://github.com/diffblue/cbmc.git` +* Develop versions are not recommended for production use. + +Report bugs +=========== + +If you encounter a problem please file a bug report: +* Create an [issue](https://github.com/diffblue/cbmc/issues) + +Contributing to the code base +============================= + +1. Fork the the CBMC repository on GitHub. +2. Clone your fork with `git clone git@github.com:YOURNAME/cbmc.git` +3. Create a branch from the `develop` branch (default branch) +4. Make your changes - follow the + +coding guidelines +5. Push your changes to your branch +6. Create a Pull Request targeting the `develop` branch + +License +======= + +4-clause BSD +license. + +Overview of Documentation +======= ### For users: -* The \ref cprover-manual "CProver Manual" details the capabilities of - CBMC and SATABS and describes how to install and use these tools. It +* The \ref cbmc-user-manual "CBMC User Manual" details the capabilities of + CBMC and describes how to install and use these tools. It also covers the underlying theory and prerequisite concepts behind how these tools work. +* There is a helpful user tutorial on the wiki with lots of linked resources, +you can access it here. + ### For contributors: +* The \subpage cprover-architecture-overview "CProver Architecture Overview" +is a single document describing the layout of the codebase and many of the +important data structures. It probably contains more information than the +module pages at the moment, but may be somewhat out-of-date. + +* For higher-level architectural information, each of the pages under + the Modules + link gives an overview of a directory in the CProver codebase. + * If you already know exactly what you're looking for, the API reference is generated from the codebase. You can search for classes and class members in the search bar at top-right or use one of the links in the sidebar. -* For higher-level architectural information, each of the pages under - the "Modules" link in the sidebar gives an overview of a directory in - the CProver codebase. - -* The \ref module_cbmc "CBMC guided tour" is a good start for new - contributors to CBMC. It describes the stages through which CBMC - transforms source files into bug reports and counterexamples, linking - to the relevant documentation for each stage. - -* The \subpage cbmc-hacking "CBMC hacking HOWTO" helps new contributors +* The \subpage tutorial "CBMC Developer Tutorial" helps new contributors to CProver to get their feet wet through a series of programming - exercises---mostly modifying goto-instrument, and thus learning to + exercises - mostly modifying goto-instrument, and thus learning to manipulate the main data structures used within CBMC. -* The \subpage cbmc-guide "CBMC guide" is a single document describing - the layout of the codebase and many of the important data structures. - It probably contains more information than the module pages at the - moment, but may be somewhat out-of-date. - \defgroup module_hidden _hidden diff --git a/doc/architectural/howto.md b/doc/architectural/howto.md index 5eee0f0058f..266cd139d5f 100644 --- a/doc/architectural/howto.md +++ b/doc/architectural/howto.md @@ -1,5 +1,9 @@ \ingroup module_hidden -\page cbmc-hacking CBMC Hacking HOWTO +\page tutorial Tutorials + +\section cbmc_tutorial CBMC Developer Tutorial + +\tableofcontents \author Kareem Khazem @@ -180,14 +184,13 @@ there any functions that you didn't expect to see? The following is quite difficult to follow from doxygen, but: the value type of `function_map` is `goto_function_templatet`. -
**Task:** Read the documentation for `goto_function_templatet` and `goto_programt`.
-Each goto_programt object contains a list of -\ref goto_program_templatet::instructiont called +Each \ref goto_programt object contains a list of +\ref goto_programt::instructiont called `instructions`. Each instruction has a field called `code`, which has type \ref codet. @@ -204,11 +207,11 @@ for a codet look like this: 0: symbol * type: array * size: nil - * type: - * #source_location: + * type: + * #source_location: * file: src/main.c * line: 18 - * function: + * function: * working_directory: /some/dir 0: unsignedbv * width: 8 diff --git a/src/ansi-c/module.md b/src/ansi-c/module.md index db72087f4d3..fb9d1191109 100644 --- a/src/ansi-c/module.md +++ b/src/ansi-c/module.md @@ -1,5 +1,5 @@ -\ingroup module_hidden -\defgroup module_ansi-c ANSI-C Language Front-end +\defgroup ansi-c ansi-c +# Folder ansi-c \author Kareem Khazem diff --git a/src/doxyfile b/src/doxyfile index 22fe04986b9..863b3e22f51 100644 --- a/src/doxyfile +++ b/src/doxyfile @@ -38,20 +38,20 @@ PROJECT_NAME = cprover # could be handy for archiving the generated documentation or if some version # control system is used. -PROJECT_NUMBER = +PROJECT_NUMBER = # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer a # quick idea about the purpose of the project. Keep the description short. -PROJECT_BRIEF = +PROJECT_BRIEF = # With the PROJECT_LOGO tag one can specify a logo or an icon that is included # in the documentation. The maximum height of the logo should not exceed 55 # pixels and the maximum width should not exceed 200 pixels. Doxygen will copy # the logo to the output directory. -PROJECT_LOGO = +PROJECT_LOGO = # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path # into which the generated documentation will be written. If a relative path is @@ -118,7 +118,7 @@ REPEAT_BRIEF = YES # the entity):The $name class, The $name widget, The $name file, is, provides, # specifies, contains, represents, a, an and the. -ABBREVIATE_BRIEF = +ABBREVIATE_BRIEF = # If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then # doxygen will generate a detailed section even if there is only a brief @@ -152,7 +152,7 @@ FULL_PATH_NAMES = YES # will be relative from the directory where doxygen is started. # This tag requires that the tag FULL_PATH_NAMES is set to YES. -STRIP_FROM_PATH = +STRIP_FROM_PATH = # The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the # path mentioned in the documentation of a class, which tells the reader which @@ -161,7 +161,7 @@ STRIP_FROM_PATH = # specify the list of include paths that are normally passed to the compiler # using the -I flag. -STRIP_FROM_INC_PATH = +STRIP_FROM_INC_PATH = # If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but # less readable) file names. This can be useful is your file systems doesn't @@ -228,13 +228,13 @@ TAB_SIZE = 8 # "Side Effects:". You can put \n's in the value part of an alias to insert # newlines. -ALIASES = +ALIASES = # This tag can be used to specify a number of word-keyword mappings (TCL only). # A mapping has the form "name=value". For example adding "class=itcl::class" # will allow you to use the command class in the itcl::class meaning. -TCL_SUBST = +TCL_SUBST = # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources # only. Doxygen will then generate output that is more tailored for C. For @@ -281,7 +281,7 @@ OPTIMIZE_OUTPUT_VHDL = NO # Note that for custom extensions you also need to set FILE_PATTERNS otherwise # the files are not read by doxygen. -EXTENSION_MAPPING = +EXTENSION_MAPPING = # If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments # according to the Markdown format, which allows for more readable @@ -604,7 +604,7 @@ STRICT_PROTO_MATCHING = NO # list. This list is created by putting \todo commands in the documentation. # The default value is: YES. -GENERATE_TODOLIST = YES +GENERATE_TODOLIST = NO # The GENERATE_TESTLIST tag can be used to enable (YES) or disable (NO) the test # list. This list is created by putting \test commands in the documentation. @@ -629,7 +629,7 @@ GENERATE_DEPRECATEDLIST= YES # sections, marked by \if ... \endif and \cond # ... \endcond blocks. -ENABLED_SECTIONS = +ENABLED_SECTIONS = # The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the # initial value of a variable or macro / define can have for it to appear in the @@ -671,7 +671,7 @@ SHOW_NAMESPACES = YES # by doxygen. Whatever the program writes to standard output is used as the file # version. For an example see the documentation. -FILE_VERSION_FILTER = +FILE_VERSION_FILTER = # The LAYOUT_FILE tag can be used to specify a layout file which will be parsed # by doxygen. The layout file controls the global structure of the generated @@ -684,7 +684,7 @@ FILE_VERSION_FILTER = # DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE # tag is left empty. -LAYOUT_FILE = +LAYOUT_FILE = # The CITE_BIB_FILES tag can be used to specify one or more bib files containing # the reference definitions. This must be a list of .bib files. The .bib @@ -694,7 +694,7 @@ LAYOUT_FILE = # LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the # search path. See also \cite for info how to create references. -CITE_BIB_FILES = +CITE_BIB_FILES = #--------------------------------------------------------------------------- # Configuration options related to warning and progress messages @@ -759,7 +759,7 @@ WARN_FORMAT = "$file:$line: $text" # messages should be written. If left blank the output is written to standard # error (stderr). -WARN_LOGFILE = +WARN_LOGFILE = #--------------------------------------------------------------------------- # Configuration options related to the input files @@ -841,7 +841,7 @@ EXCLUDE_PATTERNS = */.svn/* \ # Note that the wildcards are matched against the file with absolute path, so to # exclude all test directories use the pattern */test/* -EXCLUDE_SYMBOLS = +EXCLUDE_SYMBOLS = # The EXAMPLE_PATH tag can be used to specify one or more files or directories # that contain example code fragments that are included (see the \include @@ -854,7 +854,7 @@ EXAMPLE_PATH = ../doc/assets # *.h) to filter out the source-files in the directories. If left blank all # files are included. -EXAMPLE_PATTERNS = +EXAMPLE_PATTERNS = # If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be # searched for input files to be used with the \include or \dontinclude commands @@ -888,7 +888,7 @@ IMAGE_PATH = ../doc/assets # need to set EXTENSION_MAPPING for the extension otherwise the files are not # properly processed by doxygen. -INPUT_FILTER = +INPUT_FILTER = # The FILTER_PATTERNS tag can be used to specify filters on a per file pattern # basis. Doxygen will compare the file name with each pattern and apply the @@ -901,7 +901,7 @@ INPUT_FILTER = # need to set EXTENSION_MAPPING for the extension otherwise the files are not # properly processed by doxygen. -FILTER_PATTERNS = +FILTER_PATTERNS = # If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using # INPUT_FILTER) will also be used to filter the input files that are used for @@ -916,7 +916,7 @@ FILTER_SOURCE_FILES = NO # *.ext= (so without naming a filter). # This tag requires that the tag FILTER_SOURCE_FILES is set to YES. -FILTER_SOURCE_PATTERNS = +FILTER_SOURCE_PATTERNS = # If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that # is part of the input, its contents will be placed on the main page @@ -1028,7 +1028,7 @@ CLANG_ASSISTED_PARSING = NO # specified with INPUT and INCLUDE_PATH. # This tag requires that the tag CLANG_ASSISTED_PARSING is set to YES. -CLANG_OPTIONS = +CLANG_OPTIONS = #--------------------------------------------------------------------------- # Configuration options related to the alphabetical class index @@ -1054,7 +1054,7 @@ COLS_IN_ALPHA_INDEX = 5 # while generating the index headers. # This tag requires that the tag ALPHABETICAL_INDEX is set to YES. -IGNORE_PREFIX = +IGNORE_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the HTML output @@ -1098,7 +1098,7 @@ HTML_FILE_EXTENSION = .html # of the possible markers and block names see the documentation. # This tag requires that the tag GENERATE_HTML is set to YES. -HTML_HEADER = +HTML_HEADER = # The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each # generated HTML page. If the tag is left blank doxygen will generate a standard @@ -1108,7 +1108,7 @@ HTML_HEADER = # that doxygen normally uses. # This tag requires that the tag GENERATE_HTML is set to YES. -HTML_FOOTER = +HTML_FOOTER = # The HTML_STYLESHEET tag can be used to specify a user-defined cascading style # sheet that is used by each HTML page. It can be used to fine-tune the look of @@ -1120,7 +1120,7 @@ HTML_FOOTER = # obsolete. # This tag requires that the tag GENERATE_HTML is set to YES. -HTML_STYLESHEET = +HTML_STYLESHEET = # The HTML_EXTRA_STYLESHEET tag can be used to specify additional user-defined # cascading style sheets that are included after the standard style sheets @@ -1133,7 +1133,7 @@ HTML_STYLESHEET = # list). For an example see the documentation. # This tag requires that the tag GENERATE_HTML is set to YES. -HTML_EXTRA_STYLESHEET = +HTML_EXTRA_STYLESHEET = # The HTML_EXTRA_FILES tag can be used to specify one or more extra images or # other source files which should be copied to the HTML output directory. Note @@ -1143,7 +1143,7 @@ HTML_EXTRA_STYLESHEET = # files will be copied as-is; there are no commands or markers available. # This tag requires that the tag GENERATE_HTML is set to YES. -HTML_EXTRA_FILES = +HTML_EXTRA_FILES = # The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen # will adjust the colors in the style sheet and background images according to @@ -1272,7 +1272,7 @@ GENERATE_HTMLHELP = NO # written to the html output directory. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. -CHM_FILE = +CHM_FILE = # The HHC_LOCATION tag can be used to specify the location (absolute path # including file name) of the HTML help compiler (hhc.exe). If non-empty, @@ -1280,7 +1280,7 @@ CHM_FILE = # The file has to be specified with full path. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. -HHC_LOCATION = +HHC_LOCATION = # The GENERATE_CHI flag controls if a separate .chi index file is generated # (YES) or that it should be included in the master .chm file (NO). @@ -1293,7 +1293,7 @@ GENERATE_CHI = NO # and project file content. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. -CHM_INDEX_ENCODING = +CHM_INDEX_ENCODING = # The BINARY_TOC flag controls whether a binary table of contents is generated # (YES) or a normal table of contents (NO) in the .chm file. Furthermore it @@ -1324,7 +1324,7 @@ GENERATE_QHP = NO # the HTML output folder. # This tag requires that the tag GENERATE_QHP is set to YES. -QCH_FILE = +QCH_FILE = # The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help # Project output. For more information please see Qt Help Project / Namespace @@ -1349,7 +1349,7 @@ QHP_VIRTUAL_FOLDER = doc # filters). # This tag requires that the tag GENERATE_QHP is set to YES. -QHP_CUST_FILTER_NAME = +QHP_CUST_FILTER_NAME = # The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the # custom filter to add. For more information please see Qt Help Project / Custom @@ -1357,21 +1357,21 @@ QHP_CUST_FILTER_NAME = # filters). # This tag requires that the tag GENERATE_QHP is set to YES. -QHP_CUST_FILTER_ATTRS = +QHP_CUST_FILTER_ATTRS = # The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this # project's filter section matches. Qt Help Project / Filter Attributes (see: # http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes). # This tag requires that the tag GENERATE_QHP is set to YES. -QHP_SECT_FILTER_ATTRS = +QHP_SECT_FILTER_ATTRS = # The QHG_LOCATION tag can be used to specify the location of Qt's # qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the # generated .qhp file. # This tag requires that the tag GENERATE_QHP is set to YES. -QHG_LOCATION = +QHG_LOCATION = # If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be # generated, together with the HTML files, they form an Eclipse help plugin. To @@ -1504,7 +1504,7 @@ MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest # MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols # This tag requires that the tag USE_MATHJAX is set to YES. -MATHJAX_EXTENSIONS = +MATHJAX_EXTENSIONS = # The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces # of code that will be used on startup of the MathJax code. See the MathJax site @@ -1512,7 +1512,7 @@ MATHJAX_EXTENSIONS = # example see the documentation. # This tag requires that the tag USE_MATHJAX is set to YES. -MATHJAX_CODEFILE = +MATHJAX_CODEFILE = # When the SEARCHENGINE tag is enabled doxygen will generate a search box for # the HTML output. The underlying search engine uses javascript and DHTML and @@ -1572,7 +1572,7 @@ EXTERNAL_SEARCH = NO # Searching" for details. # This tag requires that the tag SEARCHENGINE is set to YES. -SEARCHENGINE_URL = +SEARCHENGINE_URL = # When SERVER_BASED_SEARCH and EXTERNAL_SEARCH are both enabled the unindexed # search data is written to a file for indexing by an external tool. With the @@ -1588,7 +1588,7 @@ SEARCHDATA_FILE = searchdata.xml # projects and redirect the results back to the right project. # This tag requires that the tag SEARCHENGINE is set to YES. -EXTERNAL_SEARCH_ID = +EXTERNAL_SEARCH_ID = # The EXTRA_SEARCH_MAPPINGS tag can be used to enable searching through doxygen # projects other than the one defined by this configuration file, but that are @@ -1598,7 +1598,7 @@ EXTERNAL_SEARCH_ID = # EXTRA_SEARCH_MAPPINGS = tagname1=loc1 tagname2=loc2 ... # This tag requires that the tag SEARCHENGINE is set to YES. -EXTRA_SEARCH_MAPPINGS = +EXTRA_SEARCH_MAPPINGS = #--------------------------------------------------------------------------- # Configuration options related to the LaTeX output @@ -1662,7 +1662,7 @@ PAPER_TYPE = a4wide # If left blank no extra packages will be included. # This tag requires that the tag GENERATE_LATEX is set to YES. -EXTRA_PACKAGES = +EXTRA_PACKAGES = # The LATEX_HEADER tag can be used to specify a personal LaTeX header for the # generated LaTeX document. The header should contain everything until the first @@ -1678,7 +1678,7 @@ EXTRA_PACKAGES = # to HTML_HEADER. # This tag requires that the tag GENERATE_LATEX is set to YES. -LATEX_HEADER = +LATEX_HEADER = # The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for the # generated LaTeX document. The footer should contain everything after the last @@ -1689,7 +1689,7 @@ LATEX_HEADER = # Note: Only use a user-defined footer if you know what you are doing! # This tag requires that the tag GENERATE_LATEX is set to YES. -LATEX_FOOTER = +LATEX_FOOTER = # The LATEX_EXTRA_STYLESHEET tag can be used to specify additional user-defined # LaTeX style sheets that are included after the standard style sheets created @@ -1700,7 +1700,7 @@ LATEX_FOOTER = # list). # This tag requires that the tag GENERATE_LATEX is set to YES. -LATEX_EXTRA_STYLESHEET = +LATEX_EXTRA_STYLESHEET = # The LATEX_EXTRA_FILES tag can be used to specify one or more extra images or # other source files which should be copied to the LATEX_OUTPUT output @@ -1708,7 +1708,7 @@ LATEX_EXTRA_STYLESHEET = # markers available. # This tag requires that the tag GENERATE_LATEX is set to YES. -LATEX_EXTRA_FILES = +LATEX_EXTRA_FILES = # If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated is # prepared for conversion to PDF (using ps2pdf or pdflatex). The PDF file will @@ -1816,14 +1816,14 @@ RTF_HYPERLINKS = NO # default style sheet that doxygen normally uses. # This tag requires that the tag GENERATE_RTF is set to YES. -RTF_STYLESHEET_FILE = +RTF_STYLESHEET_FILE = # Set optional variables used in the generation of an RTF document. Syntax is # similar to doxygen's config file. A template extensions file can be generated # using doxygen -e rtf extensionFile. # This tag requires that the tag GENERATE_RTF is set to YES. -RTF_EXTENSIONS_FILE = +RTF_EXTENSIONS_FILE = # If the RTF_SOURCE_CODE tag is set to YES then doxygen will include source code # with syntax highlighting in the RTF output. @@ -1868,7 +1868,7 @@ MAN_EXTENSION = .3 # MAN_EXTENSION with the initial . removed. # This tag requires that the tag GENERATE_MAN is set to YES. -MAN_SUBDIR = +MAN_SUBDIR = # If the MAN_LINKS tag is set to YES and doxygen generates man output, then it # will generate one additional man file for each entity documented in the real @@ -1981,7 +1981,7 @@ PERLMOD_PRETTY = YES # overwrite each other's variables. # This tag requires that the tag GENERATE_PERLMOD is set to YES. -PERLMOD_MAKEVAR_PREFIX = +PERLMOD_MAKEVAR_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the preprocessor @@ -2022,7 +2022,7 @@ SEARCH_INCLUDES = YES # preprocessor. # This tag requires that the tag SEARCH_INCLUDES is set to YES. -INCLUDE_PATH = +INCLUDE_PATH = # You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard # patterns (like *.h and *.hpp) to filter out the header-files in the @@ -2030,7 +2030,7 @@ INCLUDE_PATH = # used. # This tag requires that the tag ENABLE_PREPROCESSING is set to YES. -INCLUDE_FILE_PATTERNS = +INCLUDE_FILE_PATTERNS = # The PREDEFINED tag can be used to specify one or more macro names that are # defined before the preprocessor is started (similar to the -D option of e.g. @@ -2040,7 +2040,7 @@ INCLUDE_FILE_PATTERNS = # recursively expanded use the := operator instead of the = operator. # This tag requires that the tag ENABLE_PREPROCESSING is set to YES. -PREDEFINED = +PREDEFINED = # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this # tag can be used to specify a list of macro names that should be expanded. The @@ -2049,7 +2049,7 @@ PREDEFINED = # definition found in the source code. # This tag requires that the tag ENABLE_PREPROCESSING is set to YES. -EXPAND_AS_DEFINED = +EXPAND_AS_DEFINED = # If the SKIP_FUNCTION_MACROS tag is set to YES then doxygen's preprocessor will # remove all references to function-like macros that are alone on a line, have @@ -2078,13 +2078,13 @@ SKIP_FUNCTION_MACROS = YES # the path). If a tag file is not located in the directory in which doxygen is # run, you must also specify the path to the tagfile here. -TAGFILES = +TAGFILES = # When a file name is specified after GENERATE_TAGFILE, doxygen will create a # tag file that is based on the input files it reads. See section "Linking to # external documentation" for more information about the usage of tag files. -GENERATE_TAGFILE = +GENERATE_TAGFILE = # If the ALLEXTERNALS tag is set to YES, all external class will be listed in # the class index. If set to NO, only the inherited external classes will be @@ -2133,14 +2133,14 @@ CLASS_DIAGRAMS = YES # the mscgen tool resides. If left empty the tool is assumed to be found in the # default search path. -MSCGEN_PATH = +MSCGEN_PATH = # You can include diagrams made with dia in doxygen documentation. Doxygen will # then run dia to produce the diagram and insert it in the documentation. The # DIA_PATH tag allows you to specify the directory where the dia binary resides. # If left empty dia is assumed to be found in the default search path. -DIA_PATH = +DIA_PATH = # If set to YES the inheritance and collaboration graphs will hide inheritance # and usage relations if the target is undocumented or is not a class. @@ -2189,7 +2189,7 @@ DOT_FONTSIZE = 10 # the path where dot can find it using this tag. # This tag requires that the tag HAVE_DOT is set to YES. -DOT_FONTPATH = +DOT_FONTPATH = # If the CLASS_GRAPH tag is set to YES then doxygen will generate a graph for # each documented class showing the direct and indirect inheritance relations. @@ -2335,26 +2335,26 @@ INTERACTIVE_SVG = NO # found. If left blank, it is assumed the dot tool can be found in the path. # This tag requires that the tag HAVE_DOT is set to YES. -DOT_PATH = +DOT_PATH = # The DOTFILE_DIRS tag can be used to specify one or more directories that # contain dot files that are included in the documentation (see the \dotfile # command). # This tag requires that the tag HAVE_DOT is set to YES. -DOTFILE_DIRS = +DOTFILE_DIRS = # The MSCFILE_DIRS tag can be used to specify one or more directories that # contain msc files that are included in the documentation (see the \mscfile # command). -MSCFILE_DIRS = +MSCFILE_DIRS = # The DIAFILE_DIRS tag can be used to specify one or more directories that # contain dia files that are included in the documentation (see the \diafile # command). -DIAFILE_DIRS = +DIAFILE_DIRS = # When using plantuml, the PLANTUML_JAR_PATH tag should be used to specify the # path where java can find the plantuml.jar file. If left blank, it is assumed @@ -2362,12 +2362,12 @@ DIAFILE_DIRS = # generate a warning when it encounters a \startuml command in this case and # will not generate output for the diagram. -PLANTUML_JAR_PATH = +PLANTUML_JAR_PATH = # When using plantuml, the specified paths are searched for files specified by # the !include statement in a plantuml block. -PLANTUML_INCLUDE_PATH = +PLANTUML_INCLUDE_PATH = # The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of nodes # that will be shown in the graph. If the number of nodes in a graph becomes diff --git a/src/goto-programs/module.md b/src/goto-programs/module.md index f1224d46492..d7696cd6ad6 100644 --- a/src/goto-programs/module.md +++ b/src/goto-programs/module.md @@ -1,5 +1,5 @@ -\ingroup module_hidden -\defgroup module_goto-programs Goto Conversion & Instrumentation +\defgroup goto-programs goto-programs +# Folder goto-programs \author Kareem Khazem diff --git a/src/goto-symex/module.md b/src/goto-symex/module.md index c062c1d8f3f..fa34628d82d 100644 --- a/src/goto-symex/module.md +++ b/src/goto-symex/module.md @@ -1,5 +1,5 @@ -\ingroup module_hidden -\defgroup module_goto-symex Symbolic Execution & Counterexample Production +\defgroup goto-symex goto-symex +# Folder goto-symex \author Kareem Khazem diff --git a/src/solvers/refinement/README.md b/src/solvers/refinement/README.md index 0743a8098df..f0ec6efde68 100644 --- a/src/solvers/refinement/README.md +++ b/src/solvers/refinement/README.md @@ -1,11 +1,9 @@ -\page string-solver String solver +\defgroup solvers solvers +# Folder solvers +\defgroup string_solver_interface String solver interface \author Romain Brenguier -\tableofcontents - -\section string_solver_interface String solver interface - The string solver is particularly aimed at string logic, but since it inherits from \ref bv_refinementt it is also capable of handling arithmetic, array logic, floating point operations etc. diff --git a/src/util/command-line-parsing.md b/src/util/command-line-parsing.md index 3e9f803ed07..611c3c4203f 100644 --- a/src/util/command-line-parsing.md +++ b/src/util/command-line-parsing.md @@ -1,5 +1,7 @@ -\ingroup module_hidden -\defgroup module_command-line-parsing Command Line Parsing +\defgroup util util +# Folder util + +`util/` is filled with useful tools. \dot digraph G { From c1f9e34f8014dc2db544b3411294cf491d3cc958 Mon Sep 17 00:00:00 2001 From: Peter Schrammel Date: Thu, 22 Mar 2018 19:22:34 +0000 Subject: [PATCH 2/6] Move SATABS references to separate from user manual --- ...{cprover-manual.md => cbmc-user-manual.md} | 741 +----------------- doc/satabs-user-manual.md | 700 +++++++++++++++++ 2 files changed, 738 insertions(+), 703 deletions(-) rename doc/{cprover-manual.md => cbmc-user-manual.md} (73%) create mode 100644 doc/satabs-user-manual.md diff --git a/doc/cprover-manual.md b/doc/cbmc-user-manual.md similarity index 73% rename from doc/cprover-manual.md rename to doc/cbmc-user-manual.md index 90d609193d9..f947705c69d 100644 --- a/doc/cprover-manual.md +++ b/doc/cbmc-user-manual.md @@ -1,10 +1,9 @@ \ingroup module_hidden -\page cprover-manual CProver User Manual +\page cbmc-user-manual CBMC User Manual \author Daniel Kroening -This tutorial is intended for users of CProver (CBMC, SatAbs, and -associated tools). +This tutorial is intended for users of CProver primarily focused on CBMC. \tableofcontents @@ -31,21 +30,6 @@ correctness" of a design, as ensuring the absence of all bugs is too expensive for most applications. In contrast, a guarantee of the absence of specific flaws is achievable, and is a good metric of quality. -We document two programs that try to achieve formal guarantees of the -absence of specific problems: CBMC and SATABS. The algorithms -implemented by CBMC and SATABS are complementary, and often, one tool is -able to solve a problem that the other cannot solve. - -Both CBMC and SATABS are verification tools for ANSI-C/C++ programs. -They verify array bounds (buffer overflows), pointer safety, exceptions -and user-specified assertions. Both tools model integer arithmetic -accurately, and are able to reason about machine-level artifacts such as -integer overflow. CBMC and SATABS are therefore able to detect a class -of bugs that has so far gone unnoticed by many other verification tools. -This manual also covers some variants of CBMC, which includes HW-CBMC -for -\ref man_hard-soft-introduction "hardware/software co-verification". - ## Bounded Model Checking with CBMC CBMC implements a technique called *Bounded Model Checking* (BMC). In @@ -67,27 +51,10 @@ this bound is established, CBMC is able to prove the absence of errors. A more detailed description of how to apply CBMC to verify programs is \ref man_cbmc-tutorial "here". -## Automatic Program Verification with SATABS - -In many cases, lightweight properties such as array bounds do not rely -on the entire program. A large fraction of the program is *irrelevant* -to the property. SATABS exploits this observation and computes an -*abstraction* of the program in order to handle large amounts of code. - -In order to use SATABS it is not necessary to understand the abstraction -refinement process. For the interested reader, a high-level introduction -to abstraction refinement is provided -\ref man_satabs-overview "here". -We also provide -\ref man_satabs-tutorials "tutorials on how to use SATABS". - -Just as CBMC, SATABS attempts to build counterexamples that refute the -property. If such a counterexample is found, it is presented to the -engineer to facilitate localization and repair of the program. ### Example: Buffer Overflows -In order to give a brief overview of the capabilities of CBMC and SATABS +In order to give a brief overview of the capabilities of CBMC we start with a small example. The issue of *[buffer overflows](http://en.wikipedia.org/wiki/Buffer_overflow)* has obtained wide public attention. A buffer is a contiguously-allocated chunk of @@ -108,11 +75,11 @@ int main() However, the write access to an address outside the allocated memory region can lead to unexpected behavior. In particular, such bugs can be exploited to overwrite the return address of a function, thus enabling -the execution of arbitrary user-induced code. CBMC and SATABS are able +the execution of arbitrary user-induced code. CBMC is able to detect this problem and reports that the "upper bound property" of -the buffer is violated. CBMC and SATABS are capable of checking these +the buffer is violated. CBMC is capable of checking these lower and upper bounds, even for arrays with dynamic size. A detailed -discussion of the properties that CBMC and SATABS can check +discussion of the properties that CBMC can check automatically is \ref man_instrumentation-properties "here". ## Hardware/Software Co-Verification @@ -129,7 +96,7 @@ descriptions of circuitry. While both describe the same functionality, the hardware implementation usually contains more detail. It is highly desirable to establish some form for equivalence between the two descriptions. Hardware/Software co-verification and equivalence checking -with CBMC and SATABS are described +with CBMC is described \ref man_hard-soft-introduction "here". @@ -183,107 +150,19 @@ SVN](http://www.cprover.org/svn/cbmc/). To compile the source code, follow [these instructions](http://www.cprover.org/svn/cbmc/trunk/COMPILING). -\subsection man_install-satabs Installing SATABS - -### Requirements - -SATABS is available for Windows, i86 Linux, and MacOS X. SATABS requires -a code pre-processing environment comprising of a suitable preprocessor -and an a set of header files. - -1. **Linux:** the preprocessor and the header files typically come with - a package called *gcc*, which must be installed prior to the - installation of SATABS. -2. **Windows:** The Windows version of SATABS requires the preprocessor - `cl.exe`, which is part of Visual Studio (including the free [Visual - Studio - Express](http://msdn.microsoft.com/vstudio/express/visualc/)). -3. **MacOS:** Install - [XCode](http://developer.apple.com/technologies/xcode.html) prior to - installing SATABS. - -Important note for Windows users: Visual Studio's `cl.exe` relies on a -complex set of environment variables to identify the target architecture -and the directories that contain the header files. You must run SATABS -from within the *Visual Studio Command Prompt*. - -Note that the distribution files for the [Eclipse -plugin](installation-plugin.shtml) include the command-line tools. -Therefore, if you intend to run SATABS exclusively within Eclipse, you -can skip the installation of the command-line tools. However, you still -have to install the compiler environment as described above. - -### Choosing and Installing a Model Checker - -You need to install a Model Checker in order to be able to run SATABS. -You can choose between following alternatives: - -- **Cadence SMV**. Available from http://www.kenmcmil.com/smv.html. - Cadence SMV is a commercial model checker. The free version that is - available on the homepage above must not be used for commercial - purposes (read the license agreement thoroughly before you download - the tool). The documentation for SMV can be found in the directory - where you unzip/untar SMV under ./smv/doc/smv/. Read the - installation instructions carefully. The Linux/MacOS versions - require setting environment variables. You must add add the - directory containing the `smv` binary (located in ./smv/bin/, - relative to the path where you unpacked it) to your `PATH` - environment variable. SATABS uses Cadence SMV by default. - -- **NuSMV**. Available from http://nusmv.irst.itc.it/. NuSMV is the - open source alternative to Cadence SMV. Installation instructions - and documentation can be found on the NuSMV homepage. The directory - containing the NuSMV binary should be added to your `PATH` - environment variable. Use the option - - --modelchecker nusmv - - to instruct SATABS to use NuSMV. - -- **BOPPO**. Available from http://www.cprover.org/boppo/. BOPPO is - a model checker that uses SAT-solving algorithms. BOPPO relies on a - built-in SAT solver and Quantor, a solver for quantified boolean - formulas that is currently bundled with BOPPO, but also available - separately from . We recommend to add - the directories containing both tools to your `PATH` environment - variable. Use the option - - --modelchecker boppo - - when you call SATABS and want it to use BOPPO instead of SMV. - -- **BOOM**. Available from http://www.cprover.org/boom/. Boom has a - number of unique features, including the verification of programs - with unbounded thread creation. - -### Installing SATABS - -1. Download SATABS for your operating system. The binaries are - available from . -2. Unzip/untar the archive into a directory of your choice. We - recommend to add this directory to your `PATH` environment variable. - -Now you can execute SATABS. Try running SATABS on the small examples -presented in the \ref man_satabs-tutorials "SATABS tutorial". If you use -the Cadence SMV model checker, the only command line arguments you have -to specify are the names of the files that contain your program. \subsection man_install-eclipse Installing the Eclipse Plugin ### Requirements -We provide a graphical user interface to CBMC and SATABS, which is +We provide a graphical user interface to CBMC which is realized as a plugin to the Eclipse framework. Eclipse is available at http://www.eclipse.org. We do not provide installation instructions for Eclipse (basically, you only have to download the current version and extract the files to your hard-disk) and assume that you have already installed the current version. -CBMC and SATABS have their own requirements. As an example, both CBMC and SATABS require a suitable preprocessor and a set of header files. As -first step, you should therefore follow the installation instructions -for \ref man_install-cbmc "CBMC" and \ref man_install-satabs "SATABS". - Important note for Windows users: Visual Studio's `cl.exe` relies on a complex set of environment variables to identify the target architecture and the directories that contain the header files. You must run Eclipse @@ -296,7 +175,6 @@ to the download site, are available [here](http://www.cprover.org/eclipse-plugin/). This includes a small tutorial on how to use the Eclipse plugin. - \section man_cbmc CBMC: Bounded Model Checking for C, C++ and Java \subsection man_cbmc-tutorial A Short Tutorial @@ -460,6 +338,7 @@ CBMC is used for bug hunting only; CBMC does not attempt to find all bugs. The following program (lock-example.c) is an example of a program with a user-specified property: +``` _Bool nondet_bool(); _Bool LOCK = 0; @@ -494,6 +373,7 @@ with a user-specified property: times--; } } +``` The `while` loop in the `main` function has no (useful) run-time bound. Thus, a bound has to be set on the amount of unwinding that CBMC @@ -505,7 +385,7 @@ performs. There are two ways to do so: of program steps to be processed. Given the option `--unwinding-assertions`, CBMC checks whether the -arugment to `--unwind` is large enough to cover all program paths. If +argument to `--unwind` is large enough to cover all program paths. If the argument is too small, CBMC will detect that not enough unwinding is done reports that an unwinding assertion has failed. @@ -589,16 +469,19 @@ a particular depth. Technically, this is achieved by a process that essentially amounts to *unwinding loops*. This concept is best illustrated with a generic example: +``` int main(int argc, char **argv) { while(cond) { BODY CODE } } +``` A BMC instance that will find bugs with up to five iterations of the loop would contain five copies of the loop body, and essentially corresponds to checking the following loop-free program: +``` int main(int argc, char **argv) { if(cond) { BODY CODE COPY 1 @@ -616,6 +499,7 @@ corresponds to checking the following loop-free program: } } } +``` Note the use of the `if` statement to prevent the execution of the loop body in the case that the loop ends before five iterations are executed. @@ -627,6 +511,7 @@ In many cases, CBMC is able to automatically determine an upper bound on the number of loop iterations. This may even work when the number of loop unwindings is not constant. Consider the following example: +``` _Bool f(); int main() { @@ -635,6 +520,7 @@ loop unwindings is not constant. Consider the following example: } assert(0); } +``` The loop in the program above has an obvious upper bound on the number of iterations, but note that the loop may abort prematurely depending on @@ -684,6 +570,7 @@ unwinding bound is actually sufficiently large. These checks are called unwinding assertion for a bound of five corresponds to checking the following loop-free program: +``` int main(int argc, char **argv) { if(cond) { BODY CODE COPY 1 @@ -702,6 +589,7 @@ following loop-free program: } } } +``` The unwinding assertions can be verified just like any other generated assertion. If all of them are proven to hold, the given loop bounds are @@ -713,12 +601,14 @@ In some cases, it is desirable to cut off very deep loops in favor of code that follows the loop. As an example, consider the following program: +``` int main() { for(int i=0; i<10000; i++) { BODY CODE } assert(0); } +``` In the example above, small values of `--unwind` will prevent that the assertion is reached. If the code in the loop is considered irrelevant @@ -781,6 +671,7 @@ the Figure below. \image html pid.png "The pid controller" +``` 01: // CONSTANTS: 02: #define MAX_CLIMB_SUM_ERR 10 03: #define MAX_CLIMB 1 @@ -858,6 +749,7 @@ the Figure below. 75: 76: return 0; 77: } +``` In order to test the PID controller, we construct a main control loop, which repeatedly invokes the function `climb_pid_run` (line 69). This @@ -963,563 +855,6 @@ mcdc |Modified Condition/Decision Coverage (MC/DC) path |Bounded path coverage cover |Generate a test for every `__CPROVER_cover` statement - -\section man_satabs SATABS---Predicate Abstraction with SAT - -\subsection man_satabs-overview Overview - -This section describes SATABS from the point of view of the user. To -learn about the technology implemented in SATABS, read -\ref man_satabs-background "this". - -We assume you have already installed SATABS and the necessary support -files on your system. If not so, please follow -\ref man_install-satabs "these instructions". - -While users of SATABS almost never have to be concerned about the -underlying refinement abstraction algorithms, understanding the classes -of properties that can be verified is crucial. Predicate abstraction is -most effective when applied to *control-flow dominated properties*. As -an example, reconsider the following program (lock-example-fixed.c): - - _Bool nondet_bool(); - _Bool LOCK = 0; - - _Bool lock() { - if(nondet_bool()) { - assert(!LOCK); - LOCK=1; - return 1; } - - return 0; - } - - void unlock() { - assert(LOCK); - LOCK=0; - } - - int main() { - unsigned got_lock = 0; - int times; - - while(times > 0) { - if(lock()) { - got_lock++; - /* critical section */ - } - - if(got_lock!=0) { - unlock(); - got_lock--; - } - - times--; - } - } - -The two assertions in the program model that the functions `lock()` and -`unlock()` are called in the right order. Note that the value of `times` -is chosen non-deterministically and is not bounded. The program has no -run-time bound, and thus, unwinding the code with CBMC will never -terminate. - -### Working with Claims - -The two assertions will give rise to two *properties*. Each property is -associated to a specific line of code, i.e., a property is violated when -some condition can become false at the corresponding program location. -SATABS will generate a list of all properties for the programs as -follows: - - satabs lock-example-fixed.c --show-properties - -SATABS will list two properties; each property corresponds to one of the -two assertions. We can use SATABS to verify both properties as follows: - - satabs lock-example-fixed.c - -SATABS will conclude the verification successfully, that is, both -assertions hold for execution traces of any length. - -By default, SATABS attempts to verify all properties at once. A single -property can be verified (or refuted) by using the `--property id` -option of SATABS, where `id` denotes the identifier of the property in -the list obtained by calling SATABS with the `--show-properties` flag. -Whenever a property is violated, SATABS reports a feasible path that -leads to a state in which the condition that corresponds to the violated -property evaluates to false. - -\subsection man_satabs-libraries Programs that use Libraries - -SATABS cannot check programs that use functions that are only available -in binary (compiled) form (this restriction is not imposed by the -verification algorithms that are used by SATABS – they also work on -assembly code). The reason is simply that so far no assembly language -frontend is available for SATABS. At the moment, (library) functions for -which no C source code is available have to be replaced by stubs. The -usage of stubs and harnesses (as known from unit testing) also allows to -check more complicated properties (like, for example, whether function -`fopen` is always called before `fclose`). This technique is explained -in detail in the \ref man_satabs-tutorials "SATABS tutorials". - -\subsection man_satabs-unit-test Unit Testing with SATABS - -The example presented \ref man_satabs-tutorial-driver "here" is -obviously a toy example and can hardly be used to convince your project -manager to use static verification in your next project. Even though we -recommend to use formal verification and specification already in the -early phases of your project, the sad truth is that in most projects -verification (of any kind) is still pushed to the very end of the -development cycle. Therefore, this section is dedicated to the -verification of legacy code. However, the techniques presented here can -also be used for *unit testing*. - -Unit testing is used in most software development projects, and static -verification with SATABS can be very well combined with this technique. -Unit testing relies on a number test cases that yield the desired code -coverage. Such test cases are implemented by a software testing engineer -in terms of a test harness (aka test driver) and a set of function -stubs. Typically, a slight modification to the test harness allows it to -be used with SATABS. Replacing the explicit input values with -non-deterministic inputs (as explained in -\ref man_satabs-tutorial-aeon "here" and -\ref man_satabs-tutorial-driver "here")) guarantees that SATABS will try -to achieve **full** path and state coverage (due to the fact that -predicate abstraction implicitly detects equivalence classes). However, -it is not guaranteed that SATABS terminates in all cases. Keep in mind -that you must not make any assumptions about the validity of the -properties if SATABS did not run to completion! - -### Further Reading - -- [Model Checking Concurrent Linux Device - Drivers](http://www.kroening.com/publications/view-publications-wbwk2007.html) -- [SATABS: SAT-based Predicate Abstraction for - ANSI-C](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cksy2005.html) -- [Predicate Abstraction of ANSI-C Programs using - SAT](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cksy2004.html) - -\subsection man_satabs-background Background - -### Sound Abstractions - -This section provides background information on how SATABS operates. -Even for very trivial C programs it is impossible to exhaustively -examine their state space (which is potentially unbounded). However, not -all details in a C program necessarily contribute to a bug, so it may be -sufficient to only examine those parts of the program that are somehow -related to a bug. - -In practice, many static verification tools (such as `lint`) try to -achieve this goal by applying heuristics. This approach comes at a cost: -bugs might be overlooked because the heuristics do not cover all -relevant aspects of the program. Therefore, the conclusion that a -program is correct whenever such a static verification tool is unable to -find an error is invalid. - -\image html cegar-1.png "CEGAR Loop" - -A more sophisticated approach that has been very successful recently is -to generate a *sound* abstraction of the original program. In this -context, *soundness* refers to the fact that the abstract program -contains (at least) all relevant behaviors (i.e., bugs) that are present -in the original program. In the Figure above, the first component strips -details from the original program. The number of possible behaviors -increases as the number of details in the abstract program decreases. -Intuitively, the reason is that whenever the model checking tool lacks -the information that is necessary to make an accurate decision on -whether a branch of an control flow statement can be taken or not, both -branches have to be considered. - -In the resulting *abstract program*, a set of concrete states is -subsumed by means of a single abstract state. Consider the following -figure: - -![](states.png) - -The concrete states *x*~1~ and *x*~2~ are mapped to an abstract state -*X*, and similarly *Y* subsumes *y*~1~ and *y*~2~. However, all -transitions that are possible in the concrete program are also possible -in the abstract model. The abstract transition *X* → *Y* summarizes the -concrete transitions *x*~1~ → *y*~1~ and *x*~1~ → *x*~1~, and *Y* → *X* -corresponds to *x*~1~ → *x*~2~. The behavior *X* → *Y* → *X* is feasible -in the original program, because it maps to *x*~1~ → *x*~1~ → *x*~2~. -However, *Y* → *X* → *Y* is feasible only in the abstract model. - -### Spurious Counterexamples - -The consequence is that the model checker (component number two in the -figure above) possibly reports a *spurious* counterexample. We call a -counterexample spurious whenever it is feasible in the current abstract -model but not in the original program. However, whenever the model -checker is unable to find an execution trace that violates the given -property, we can conclude that there is no such trace in the original -program, either. - -The feasibility of counterexamples is checked by *symbolic simulation* -(performed by component three in the figure above). If the -counterexample is indeed feasible, SATABS found a bug in the original -program and reports it to the user. - -### Automatic Refinement - -On the other hand, infeasible counterexamples (that originate from -abstract behaviors that result from the omission of details and are not -present in the original program) are never reported to the user. -Instead, the information is used in order to refine the abstraction such -that the spurious counterexample is not part of the refined model -anymore. For instance, the reason for the infeasibility of *Y* → *X* → -*Y* is that neither *y*~1~ nor *x*~1~ can be reached from *x*~2~. -Therefore, the abstraction can be refined by partitioning *X*. - -The refinement steps can be illustrated as follows: - -![Iterative refinement](refinement.png) - -The first step (1) is to generate a very coarse abstraction with a very -small state space. This abstraction is then successively refined (2, 3, -...) until either a feasible counterexample is found or the abstract -program is detailed enough to show that there is no path that leads to a -violation of the given property. The problem is that this point is not -necessarily reached for every input program, i.e., it is possible that -the the abstraction refinement loop never terminates. Therefore, SATABS -allows to specify an upper bound for the number of iterations. - -When this upper bound is reached and no counterexample was found, this -does not necessarily mean that there is none. In this case, you cannot -make any conclusions at all with respect to the correctness of the input -program. - -\subsection man_satabs-tutorials Tutorials - -We provide an introduction to model checking "real" C programs with -SATABS using two small examples. - -\subsubsection man_satabs-tutorial-driver Reference Counting in Linux Device Drivers - -Microsoft's [SLAM](http://research.microsoft.com/SLAM) toolkit has been -successfully used to find bugs in Windows device drivers. SLAM -automatically verifies device driver whether a device driver adheres to -a set of specifications. SLAM provides a test harness for device drivers -that calls the device driver dispatch routines in a non-deterministic -order. Therefore, the Model Checker examines all combinations of calls. -Motivated by the success this approach, we provide a toy example based -on Linux device drivers. For a more complete approach to the -verification of Linux device drivers, consider -[DDVerify](http://www.cprover.org/ddverify/). - -Dynamically loadable modules enable the Linux Kernel to load device -drivers on demand and to release them when they are not needed anymore. -When a device driver is registered, the kernel provides a major number -that is used to uniquely identify the device driver. The corresponding -device can be accessed through special files in the filesystem; by -convention, they are located in the `/dev` directory. If a process -accesses a device file the kernel calls the corresponding `open`, `read` -and `write` functions of the device driver. Since a driver must not be -released by the kernel as long as it is used by at least one process, -the device driver must maintain a usage counter (in more recent Linux -kernels, this is done automatically, however, drivers that must maintain -backward compatibility have to adjust this counter). - -We provide a skeleton of such a driver. Download the files -assets/spec.c, assets/driver.c, assets/driver.h, assets/kdev_t.h, and -assets/modules.h. - -The driver contains following functions: - -1. `register_chrdev`: (in assets/spec.c) Registers a character device. - In our implementation, the function sets the variable `usecount` to - zero and returns a major number for this device (a constant, if the - user provides 0 as argument for the major number, and the value - specified by the user otherwise). - - int usecount; - - int register_chrdev (unsigned int major, const char* name) - { - usecount = 0; - if (major == 0) - return MAJOR_NUMBER; - return major; - } - -2. `unregister_chrdev`: (in assets/spec.c) Unregisters a character - device. This function asserts that the device is not used by any - process anymore (we use the macro `MOD_IN_USE` to check this). - - int unregister_chrdev (unsigned int major, const char* name) - { - if (MOD_IN_USE) - { - ERROR: assert (0); - } - else - return 0; - } - -3. `dummy_open`: (in assets/driver.c) This function increases the - `usecount`. If the device is locked by some other process - `dummy_open` returns -1. Otherwise it locks the device for the - caller. - -4. `dummy_read`: (in assets/driver.c) This function "simulates" a read - access to the device. In fact it does nothing, since we are - currently not interested in the potential buffer overflow that may - result from a call to this function. Note the usage of the function - `nondet_int`: This is an internal SATABS-function that - non­determi­nistically returns an arbitrary integer value. The - function `__CPROVER_assume` tells SATABS to ignore all traces that - do not adhere to the given assumption. Therefore, whenever the lock - is held, `dummy_read` will return a value between 0 and `max`. If - the lock is not held, then `dummy_read` returns -1. - -5. `dummy_release`: (in assets/driver.c) If the lock is held, then - `dummy_release` decreases the `usecount`, releases the lock, and - returns 0. Otherwise, the function returns -1. - -We now want to check if any *valid* sequence of calls of the dispatch -functions (in driver.c) can lead to the violation of the assertion (in -assets/spec.c). Obviously, a call to `dummy_open` that is immediately -followed by a call to `unregister_chrdev` violates the assertion. - -The function `main` in spec.c gives an example of how these functions -are called. First, a character device "`dummy`" is registered. The major -number is stored in the `inode` structure of the device. The values for -the file structure are assigned non-deterministically. We rule out -invalid sequences of calls by ensuring that no device is unregistered -while it is still locked. We use the following model checking harness -for calling the dispatching functions: - - random = nondet_uchar (); - __CPROVER_assume (0 <= random && random <= 3); - - switch (random) - { - case 1: - rval = dummy_open (&inode, &my_file); - if (rval == 0) - lock_held = TRUE; - break; - case 2: - __CPROVER_assume (lock_held); - count = dummy_read (&my_file, buffer, BUF_SIZE); - break; - case 3: - dummy_release (&inode, &my_file); - lock_held = FALSE; - break; - default: - break; - } - -The variable `random` is assigned non-deterministically. Subsequently, -the value of `random` is restricted to be 0 &le `random` ≤ 3 by a call -to `__CPROVER_assume`. Whenever the value of `random` is not in this -interval, the corresponding execution trace is simply discarded by -SATABS. Depending on the value of `random`, the harness calls either -`dummy_open`, `dummy_read` or `dummy_close`. Therefore, if there is a -sequence of calls to these three functions that leads to a violation of -the assertion in `unregister_chrdev`, then SATABS will eventually -consider it. - -If we ask SATABS to show us the properties it verifies with - - satabs driver.c spec.c --show-properties - -for our example, we obtain - -1. Claim unregister\_chrdev.1:\ -     file spec.c line 18 function unregister\_chrdev\ -     MOD\_IN\_USE in unregister\_chrdev\ -     FALSE - -2. Claim dummy\_open.1:\ -     file driver.c line 15 function dummy\_open\ -     i\_rdev mismatch\ -     (unsigned int)inode->i\_rdev >> 8 == (unsigned - int)dummy\_major - -It seems obvious that the property dummy\_open.1 can never be violated. -SATABS confirms this assumption: We call - - satabs driver.c spec.c --property dummy_open.1 - -and SATABS reports `VERIFICATION SUCCESSFUL` after a few iterations. - -If we try to verify property unregister\_chrdev.1, SATABS reports that -the property in line 18 in file spec.c is violated (i.e., the assertion -does not hold, therefore the `VERIFICATION FAILED`). Furthermore, SATABS -provides a detailed description of the problem in the form of a -counterexample (i.e., an execution trace that violates the property). On -this trace, `dummy_open` is called **twice**, leading to a `usecount` of 2. -The second call of course fails with `rval=-1`, but the counter is -increased nevertheless: - - int dummy_open (struct inode *inode, struct file *filp) - { - __CPROVER_assert(MAJOR (inode->i_rdev) == dummy_major, - "i_rdev mismatch"); - MOD_INC_USE_COUNT; - - if (locked) - return -1; - locked = TRUE; - - return 0; /* success */ - } - -Then, `dummy_release` is called to release the lock on the device. -Finally, the loop is left and the call to `unregister_chrdev` results in -a violation of the assertion (since `usecount` is still 1, even though -`locked=0`). - -\subsubsection man_satabs-tutorial-aeon Buffer Overflow in a Mail Transfer Agent - -We explain how to model check Aeon version 0.2a, a small mail transfer -agent written by Piotr Benetkiewicz. The description advertises Aeon as -a "*good choice for **hardened** or minimalistic boxes*". The sources -are available -[here](http://www.cprover.org/satabs/examples/loop_detection/aeon-0.2a.tar.gz). - -Our first naive attempt to verify Aeon using - - satabs *.c - -produces a positive result, but also warns us that the property holds -trivially. It also reveals that a large number library functions are -missing: SATABS is unable to find the source code for library functions -like `send`, `write` and `close`. - -Now, do you have to provide a body for all missing library functions? -There is no easy answer to this question, but a viable answer would be -"most likely not". It is necessary to understand how SATABS handles -functions without bodies: It simply assumes that such a function returns -an arbitrary value, but that no other locations than the one on the left -hand side of the assignment are changed. Obviously, there are cases in -which this assumption is un­sound, since the function potentially -modifies all memory locations that it can somehow address. - -We now use static analysis to generate array bounds checks for Aeon: - - satabs *.c --pointer-check --bounds-check --show-properties - -SATABS will show about 300 properties in various functions (read -\ref man_instrumentation-properties "this" for more information on the -property instrumentation). Now consider the first few lines of the -`main` function of Aeon: - - int main(int argc, char **argv) - { - char settings[MAX_SETTINGS][MAX_LEN]; - ... - numSet = getConfig(settings); - if (numSet == -1) { - logEntry("Missing config file!"); - exit(1); - } - ... - -and the function `getConfig` in `lib_aeon.c`: - - int getConfig(char settings[MAX_SETTINGS][MAX_LEN]) - { - char home[MAX_LEN]; - FILE *fp; /* .rc file handler */ - int numSet = 0; /* number of settings */ - - strcpy(home, getenv("HOME")); /* get home path */ - strcat(home, "/.aeonrc"); /* full path to rc file */ - fp = fopen(home, "r"); - if (fp == NULL) return -1; /* no cfg - ERROR */ - while (fgets(settings[numSet], MAX_LEN-1, fp) - && (numSet < MAX_SETTINGS)) numSet++; - fclose(fp); - - return numSet; - } - -The function `getConfig` makes calls to `strcpy`, `strcat`, `getenv`, -`fopen`, `fgets`, and `fclose`. It is very easy to provide an -implementation for the functions from the string library (string.h), and -SATABS comes with meaningful definitions for most of them. The -definition of `getenv` is not so straight-forward. The man-page of -`getenv` (which we obtain by entering `man 3 getenv` in a Unix or cygwin -command prompt) tells us: - - `` `getenv' `` searches the list of en­vi­ron­ment variable names - and values (using the global pointer `char **environ`) for a - variable whose name matches the string at `NAME`. If a variable name - matches, `getenv` returns a pointer to the associated value. - -SATABS has no information whatsoever about the content of `environ`. -Even if SATABS could access the environment variables on your -computer, a successful verification of `Aeon` would then only guarantee -that the properties for this program hold on your computer with a -specific set of en­vi­ron­ment variables. We have to assume that -`environ` contains en­vi­ron­ment variables that have an arbitrary -content of arbitrary length. The content of en­vi­ron­ment variables is -not only arbitrary but could be malefic, since it can be modified by the -user. The approximation of the behavior of `getenv` that is shipped with -SATABS completely ignores the content of the string. - -Now let us have another look at the properties that SATABS generates for -the models of the the string library and for `getenv`. Most of these -properties require that we verify that the upper and lower bounds of -buffers or arrays are not violated. Let us look at one of the properties -that SATABS generates for the code in function `getConfig`: - - Claim getConfig.3:   file lib_aeon.c line 19 function getConfig   dereference failure: NULL plus offset pointer   !(SAME-OBJECT(src, NULL))` - -The model of the function `strcpy` dereferences the pointer returned by -`getenv`, which may return a NULL pointer. This possibility is detected -by the static analysis, and thus a corresponding property is generated. -Let us check this specific property: - - satabs *.c --pointer-check --bounds-check --property getConfig.3 - -SATABS immediately returns a counterexample path that demonstrates how -`getenv` returns a NULL, which is subsequently dereferenced. We have -identified the first bug in this program: it requires that the -environment variable HOME is set, and crashes otherwise. - -Let us examine one more property in the same function: - - Claim getConfig.7:   file lib_aeon.c line 19 function getConfig   dereference failure: array `home' upper bound   !(POINTER_OFFSET(dst) + (int)i >= 512) || !(SAME-OBJECT(dst, &home[0])) - -This property asserts that the upper bound of the array `home` is not -violated. The variable `home` looks familiar: We encountered it in the -function `getConfig` given above. The function `getenv` in combination -with functions `strcpy`, `strcat` or `sprintf` is indeed often the -source for buffer overflows. Therefore, we try to use SATABS to check -the upper bound of the array `home`: - - satabs *.c --pointer-check --bounds-check --property getConfig.7 - -SATABS runs for quite a while and will eventually give up, telling us -that its upper bound for abstraction refinement iterations has been -exceeded. This is not exactly the result we were hoping for, and we -could now increase the bound for iterations with help of the -`--iterations` command line switch of SATABS. - -Before we do this, let us investigate why SATABS has failed to provide a -useful result. The function `strcpy` contains a loop that counts from 1 -to the length of the input string. Predicate abstraction, the mechanism -SATABS is based on, is unable to detect such loops and will therefore -unroll the loop body as often as necessary. The array `home` has -`MAX_LEN` elements, and `MAX_LEN` is defined to be 512 in `aeon.h`. -Therefore, SATABS would have to run through at least 512 iterations, -only to verify (or reject) one of the more than 300 properties! Does -this fact defeat the purpose of static verification? - -We can make the job easier: after reducing the value of `MAX_LEN` in -`aeon.h` to a small value, say to 10, SATABS provides a counterexample -trace that demonstrates how the buffer overflow be reproduced. If you -use the Eclipse plugin (as described \ref man_install-eclipse "here"), -you can step through this counterexample. The trace contains the string -that is returned by `getenv`. - - \section man_modelling Modelling \subsection man_modelling-nondet Nondeterminism @@ -2412,7 +1747,7 @@ lines like `CC=gcc` by `CC=goto-cc`. A helpful command that accomplishes this task successfully for many projects is the following: - for i in `find . -name Makefile`; do   sed -e 's/^\(\s*CC[ \t]*=\)\(.*$\)/\1goto-cc/g' -i $i done + for i in `find . -name Makefile`; do sed -e 's/^\(\s*CC[ \t]*=\)\(.*$\)/\1goto-cc/g' -i $i done Here are additional examples on how to use goto-cc: @@ -2466,7 +1801,7 @@ Visual Studio project as follows: 5. Type - msbuild /p:CLToolExe=goto-cl.exe /p:LinkToolExe=goto-link.exe    /p:Flavor=goto-cc /p:Platform=x86 + msbuild /p:CLToolExe=goto-cl.exe /p:LinkToolExe=goto-link.exe /p:Flavor=goto-cc /p:Platform=x86 The platform can be adjusted as required; the "Flavor" given should match the configuration that was created earlier. @@ -2652,7 +1987,7 @@ The program contains an obvious NULL-pointer dereference. We first compile the example program with goto-cc and then instrument the resulting goto-binary with pointer checks. - goto-cc expr.c -o in.gb   goto-instrument in.gb out.gb --pointer-check + goto-cc expr.c -o in.gb goto-instrument in.gb out.gb --pointer-check We can now get a list of the assertions that have been generated as follows: @@ -2888,14 +2223,14 @@ techniques. In the following, we show how to extract models from Linux 3. Now do -   `bunzip2 linux-2.6.39.tar.bz2`\ -   `tar xvf linux-2.6.39.tar`\ -   `cd linux-2.6.39` + `bunzip2 linux-2.6.39.tar.bz2`\ + `tar xvf linux-2.6.39.tar`\ + `cd linux-2.6.39` 4. Now ensure that you can actually compile a kernel by doing -   `make defconfig`\ -   `make` + `make defconfig`\ + `make` These steps need to succeed before you can try to extract models from the kernel. @@ -2903,9 +2238,9 @@ techniques. In the following, we show how to extract models from Linux 5. Now compile [gcc-wrap.c](gcc-wrap.c) and put the resulting binary into a directory that is in your PATH variable: -   `lwp-download http://www.cprover.org/cprover-manual/gcc-wrap.c`\ -   `gcc gcc-wrap.c -o gcc-wrap`\ -   `cp gcc-wrap ~/bin/`\ + `lwp-download http://www.cprover.org/cprover-manual/gcc-wrap.c`\ + `gcc gcc-wrap.c -o gcc-wrap`\ + `cp gcc-wrap ~/bin/`\ This assumes that the directory `~/bin` exists and is in your PATH variable. @@ -2916,8 +2251,8 @@ techniques. In the following, we show how to extract models from Linux 7. Now do -   make clean -   make + make clean + make This will re-compile the kernel, but this time retaining the preprocessed source files. @@ -2927,7 +2262,7 @@ techniques. In the following, we show how to extract models from Linux find ./ -name .tmp_*.i > source-file-list for a in `cat source-file-list` ; do -   goto-cc -c $a -o $a.gb + goto-cc -c $a -o $a.gb done Note that it is important that the word-size of the kernel @@ -2976,7 +2311,7 @@ show how to extract models from Apache HTTPD 2.4.2. find ./ -name *.i > source-file-list for a in `cat source-file-list` ; do -   goto-cc -c $a -o $a.gb + goto-cc -c $a -o $a.gb done The resulting `.gb` files can be passed to any of the CPROVER tools. diff --git a/doc/satabs-user-manual.md b/doc/satabs-user-manual.md new file mode 100644 index 00000000000..763f52a7398 --- /dev/null +++ b/doc/satabs-user-manual.md @@ -0,0 +1,700 @@ +\page other_documentation Other Documentation + +\section satabs SATABS + +We document two programs that try to achieve formal guarantees of the +absence of specific problems: CBMC and SATABS. The algorithms +implemented by CBMC and SATABS are complementary, and often, one tool is +able to solve a problem that the other cannot solve. + +Both CBMC and SATABS are verification tools for ANSI-C/C++ programs. +They verify array bounds (buffer overflows), pointer safety, exceptions +and user-specified assertions. Both tools model integer arithmetic +accurately, and are able to reason about machine-level artifacts such as +integer overflow. CBMC and SATABS are therefore able to detect a class +of bugs that has so far gone unnoticed by many other verification tools. +This manual also covers some variants of CBMC, which includes HW-CBMC +for +\ref man_hard-soft-introduction "hardware/software co-verification". + +## Automatic Program Verification with SATABS + +In many cases, lightweight properties such as array bounds do not rely +on the entire program. A large fraction of the program is *irrelevant* +to the property. SATABS exploits this observation and computes an +*abstraction* of the program in order to handle large amounts of code. + +In order to use SATABS it is not necessary to understand the abstraction +refinement process. For the interested reader, a high-level introduction +to abstraction refinement is provided +\ref man_satabs-overview "here". +We also provide +\ref man_satabs-tutorials "tutorials on how to use SATABS". + +Just as CBMC, SATABS attempts to build counterexamples that refute the +property. If such a counterexample is found, it is presented to the +engineer to facilitate localization and repair of the program. + +\subsection man_install-satabs Installing SATABS + +### Requirements + +SATABS is available for Windows, i86 Linux, and MacOS X. SATABS requires +a code pre-processing environment including a suitable preprocessor +and a set of header files. + +1. **Linux:** the preprocessor and the header files typically come with + a package called *gcc*, which must be installed prior to the + installation of SATABS. +2. **Windows:** The Windows version of SATABS requires the preprocessor + `cl.exe`, which is part of Visual Studio (including the free [Visual + Studio + Express](http://msdn.microsoft.com/vstudio/express/visualc/)). +3. **MacOS:** Install Xcode command line tools: run + `xcode-select --install` prior to installing SATABS. + +Important note for Windows users: Visual Studio's `cl.exe` relies on a +complex set of environment variables to identify the target architecture +and the directories that contain the header files. You must run SATABS +from within the *Visual Studio Command Prompt*. + +Note that the distribution files for the [Eclipse +plugin](installation-plugin.shtml) include the command-line tools. +Therefore, if you intend to run SATABS exclusively within Eclipse, you +can skip the installation of the command-line tools. However, you still +have to install the compiler environment as described above. + +### Choosing and Installing a Model Checker + +You need to install a Model Checker in order to be able to run SATABS. +You can choose between following alternatives: + +- **Cadence SMV**. Available from http://www.kenmcmil.com/smv.html. + Cadence SMV is a commercial model checker. The free version that is + available on the homepage above must not be used for commercial + purposes (read the license agreement thoroughly before you download + the tool). The documentation for SMV can be found in the directory + where you unzip/untar SMV under ./smv/doc/smv/. Read the + installation instructions carefully. The Linux/MacOS versions + require setting environment variables. You must add add the + directory containing the `smv` binary (located in ./smv/bin/, + relative to the path where you unpacked it) to your `PATH` + environment variable. SATABS uses Cadence SMV by default. + +- **NuSMV**. Available from http://nusmv.irst.itc.it/. NuSMV is the + open source alternative to Cadence SMV. Installation instructions + and documentation can be found on the NuSMV homepage. The directory + containing the NuSMV binary should be added to your `PATH` + environment variable. Use the option + + --modelchecker nusmv + + to instruct SATABS to use NuSMV. + +- **BOPPO**. Available from http://www.cprover.org/boppo/. BOPPO is + a model checker which uses SAT-solving algorithms. BOPPO relies on a + built-in SAT solver and Quantor, a solver for quantified boolean + formulas which is currently bundled with BOPPO, but also available + separately from . We recommend to add + the directories containing both tools to your `PATH` environment + variable. Use the option + + --modelchecker boppo + + when you call SATABS and want it to use BOPPO instead of SMV. + +- **BOOM**. Available from http://www.cprover.org/boom/. Boom has a + number of unique features, including the verification of programs + with unbounded thread creation. + +### Installing SATABS + +1. Download SATABS for your operating system. The binaries are + available from . +2. Unzip/untar the archive into a directory of your choice. We + recommend to add this directory to your `PATH` environment variable. + +Now you can execute SATABS. Try running SATABS on the small examples +presented in the \ref man_satabs-tutorials "SATABS tutorial". If you use +the Cadence SMV model checker, the only command line arguments you have +to specify are the names of the files that contain your program. + +### Requirements + +We provide a graphical user interface to CBMC and SATABS, which is +realized as a plugin to the Eclipse framework. Eclipse is available at +http://www.eclipse.org. We do not provide installation instructions for +Eclipse (basically, you only have to download the current version and +extract the files to your hard-disk) and assume that you have already +installed the current version. + +CBMC and SATABS have their own requirements. As an example, both CBMC +and SATABS require a suitable preprocessor and a set of header files. +As first step, you should therefore follow the installation instructions +for \ref man_install-cbmc "CBMC" and \ref man_install-satabs "SATABS". + +Important note for Windows users: Visual Studio's `cl.exe` relies on a +complex set of environment variables to identify the target architecture +and the directories that contain the header files. You must run Eclipse +from within the *Visual Studio Command Prompt*. + +\section man_satabs SATABS---Predicate Abstraction with SAT + +\subsection man_satabs-overview Overview + +This section describes SATABS from the point of view of the user. To +learn about the technology implemented in SATABS, read +\ref man_satabs-background "this". + +We assume you have already installed SATABS and the necessary support +files on your system. If not so, please follow +\ref man_install-satabs "these instructions". + +While users of SATABS almost never have to be concerned about the +underlying refinement abstraction algorithms, understanding the classes +of properties that can be verified is crucial. Predicate abstraction is +most effective when applied to *control-flow dominated properties*. As +an example, reconsider the following program (lock-example-fixed.c): + +``` + _Bool nondet_bool(); + _Bool LOCK = 0; + + _Bool lock() { + if(nondet_bool()) { + assert(!LOCK); + LOCK=1; + return 1; } + + return 0; + } + + void unlock() { + assert(LOCK); + LOCK=0; + } + + int main() { + unsigned got_lock = 0; + int times; + + while(times > 0) { + if(lock()) { + got_lock++; + /* critical section */ + } + + if(got_lock!=0) { + unlock(); + got_lock--; + } + + times--; + } + } +``` + +The two assertions in the program model that the functions `lock()` and +`unlock()` are called in the right order. Note that the value of `times` +is chosen non-deterministically and is not bounded. The program has no +run-time bound, and thus, unwinding the code with CBMC will never +terminate. + +### Working with Claims + +The two assertions will give rise to two *properties*. Each property is +associated to a specific line of code, i.e., a property is violated when +some condition can become false at the corresponding program location. +SATABS will generate a list of all properties for the programs as +follows: + + satabs lock-example-fixed.c --show-properties + +SATABS will list two properties; each property corresponds to one of the +two assertions. We can use SATABS to verify both properties as follows: + + satabs lock-example-fixed.c + +SATABS will conclude the verification successfully, that is, both +assertions hold for execution traces of any length. + +By default, SATABS attempts to verify all properties at once. A single +property can be verified (or refuted) by using the `--property id` +option of SATABS, where `id` denotes the identifier of the property in +the list obtained by calling SATABS with the `--show-properties` flag. +Whenever a property is violated, SATABS reports a feasible path that +leads to a state in which the condition that corresponds to the violated +property evaluates to false. + +\subsection man_satabs-libraries Programs that use Libraries + +SATABS cannot check programs that use functions that are only available +in binary (compiled) form (this restriction is not imposed by the +verification algorithms that are used by SATABS – they also work on +assembly code). The reason is simply that so far no assembly language +frontend is available for SATABS. At the moment, (library) functions for +which no C source code is available have to be replaced by stubs. The +usage of stubs and harnesses (as known from unit testing) also allows to +check more complicated properties (like, for example, whether function +`fopen` is always called before `fclose`). This technique is explained +in detail in the \ref man_satabs-tutorials "SATABS tutorials". + +\subsection man_satabs-unit-test Unit Testing with SATABS + +The example presented \ref man_satabs-tutorial-driver "here" is +obviously a toy example and can hardly be used to convince your project +manager to use static verification in your next project. Even though we +recommend to use formal verification and specification already in the +early phases of your project, the sad truth is that in most projects +verification (of any kind) is still pushed to the very end of the +development cycle. Therefore, this section is dedicated to the +verification of legacy code. However, the techniques presented here can +also be used for *unit testing*. + +Unit testing is used in most software development projects, and static +verification with SATABS can be very well combined with this technique. +Unit testing relies on a number test cases that yield the desired code +coverage. Such test cases are implemented by a software testing engineer +in terms of a test harness (aka test driver) and a set of function +stubs. Typically, a slight modification to the test harness allows it to +be used with SATABS. Replacing the explicit input values with +non-deterministic inputs (as explained in +\ref man_satabs-tutorial-aeon "here" and +\ref man_satabs-tutorial-driver "here")) guarantees that SATABS will try +to achieve **full** path and state coverage (due to the fact that +predicate abstraction implicitly detects equivalence classes). However, +it is not guaranteed that SATABS terminates in all cases. Keep in mind +that you must not make any assumptions about the validity of the +properties if SATABS did not run to completion! + +### Further Reading + +- [Model Checking Concurrent Linux Device + Drivers](http://www.kroening.com/publications/view-publications-wbwk2007.html) +- [SATABS: SAT-based Predicate Abstraction for + ANSI-C](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cksy2005.html) +- [Predicate Abstraction of ANSI-C Programs using + SAT](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cksy2004.html) + +\subsection man_satabs-background Background + +### Sound Abstractions + +This section provides background information on how SATABS operates. +Even for very trivial C programs it is impossible to exhaustively +examine their state space (which is potentially unbounded). However, not +all details in a C program necessarily contribute to a bug, so it may be +sufficient to only examine those parts of the program that are somehow +related to a bug. + +In practice, many static verification tools (such as `lint`) try to +achieve this goal by applying heuristics. This approach comes at a cost: +bugs might be overlooked because the heuristics do not cover all +relevant aspects of the program. Therefore, the conclusion that a +program is correct whenever such a static verification tool is unable to +find an error is invalid. + +\image html cegar-1.png "CEGAR Loop" + +A more sophisticated approach that has been very successful recently is +to generate a *sound* abstraction of the original program. In this +context, *soundness* refers to the fact that the abstract program +contains (at least) all relevant behaviors (i.e., bugs) that are present +in the original program. In the Figure above, the first component strips +details from the original program. The number of possible behaviors +increases as the number of details in the abstract program decreases. +Intuitively, the reason is that whenever the model checking tool lacks +the information that is necessary to make an accurate decision on +whether a branch of an control flow statement can be taken or not, both +branches have to be considered. + +In the resulting *abstract program*, a set of concrete states is +subsumed by means of a single abstract state. Consider the following +figure: + +![](states.png) + +The concrete states *x*~1~ and *x*~2~ are mapped to an abstract state +*X*, and similarly *Y* subsumes *y*~1~ and *y*~2~. However, all +transitions that are possible in the concrete program are also possible +in the abstract model. The abstract transition *X* → *Y* summarizes the +concrete transitions *x*~1~ → *y*~1~ and *x*~1~ → *x*~1~, and *Y* → *X* +corresponds to *x*~1~ → *x*~2~. The behavior *X* → *Y* → *X* is feasible +in the original program, because it maps to *x*~1~ → *x*~1~ → *x*~2~. +However, *Y* → *X* → *Y* is feasible only in the abstract model. + +### Spurious Counterexamples + +The consequence is that the model checker (component number two in the +figure above) possibly reports a *spurious* counterexample. We call a +counterexample spurious whenever it is feasible in the current abstract +model but not in the original program. However, whenever the model +checker is unable to find an execution trace that violates the given +property, we can conclude that there is no such trace in the original +program, either. + +The feasibility of counterexamples is checked by *symbolic simulation* +(performed by component three in the figure above). If the +counterexample is indeed feasible, SATABS found a bug in the original +program and reports it to the user. + +### Automatic Refinement + +On the other hand, infeasible counterexamples (that originate from +abstract behaviors that result from the omission of details and are not +present in the original program) are never reported to the user. +Instead, the information is used in order to refine the abstraction such +that the spurious counterexample is not part of the refined model +anymore. For instance, the reason for the infeasibility of *Y* → *X* → +*Y* is that neither *y*~1~ nor *x*~1~ can be reached from *x*~2~. +Therefore, the abstraction can be refined by partitioning *X*. + +The refinement steps can be illustrated as follows: + +![Iterative refinement](refinement.png) + +The first step (1) is to generate a very coarse abstraction with a very +small state space. This abstraction is then successively refined (2, 3, +...) until either a feasible counterexample is found or the abstract +program is detailed enough to show that there is no path that leads to a +violation of the given property. The problem is that this point is not +necessarily reached for every input program, i.e., it is possible that +the the abstraction refinement loop never terminates. Therefore, SATABS +allows to specify an upper bound for the number of iterations. + +When this upper bound is reached and no counterexample was found, this +does not necessarily mean that there is none. In this case, you cannot +make any conclusions at all with respect to the correctness of the input +program. + +\subsection man_satabs-tutorials Tutorials + +We provide an introduction to model checking "real" C programs with +SATABS using two small examples. + +\subsubsection man_satabs-tutorial-driver Reference Counting in Linux Device Drivers + +Microsoft's [SLAM](http://research.microsoft.com/SLAM) toolkit has been +successfully used to find bugs in Windows device drivers. SLAM +automatically verifies device driver whether a device driver adheres to +a set of specifications. SLAM provides a test harness for device drivers +that calls the device driver dispatch routines in a non-deterministic +order. Therefore, the Model Checker examines all combinations of calls. +Motivated by the success this approach, we provide a toy example based +on Linux device drivers. For a more complete approach to the +verification of Linux device drivers, consider +[DDVerify](http://www.cprover.org/ddverify/). + +Dynamically loadable modules enable the Linux Kernel to load device +drivers on demand and to release them when they are not needed anymore. +When a device driver is registered, the kernel provides a major number +that is used to uniquely identify the device driver. The corresponding +device can be accessed through special files in the filesystem; by +convention, they are located in the `/dev` directory. If a process +accesses a device file the kernel calls the corresponding `open`, `read` +and `write` functions of the device driver. Since a driver must not be +released by the kernel as long as it is used by at least one process, +the device driver must maintain a usage counter (in more recent Linux +kernels, this is done automatically, however, drivers that must maintain +backward compatibility have to adjust this counter). + +We provide a skeleton of such a driver. Download the files +assets/spec.c, assets/driver.c, assets/driver.h, assets/kdev_t.h, and +assets/modules.h. + +The driver contains following functions: + +1. `register_chrdev`: (in assets/spec.c) Registers a character device. + In our implementation, the function sets the variable `usecount` to + zero and returns a major number for this device (a constant, if the + user provides 0 as argument for the major number, and the value + specified by the user otherwise). + + int usecount; + + int register_chrdev (unsigned int major, const char* name) + { + usecount = 0; + if (major == 0) + return MAJOR_NUMBER; + return major; + } + +2. `unregister_chrdev`: (in assets/spec.c) Unregisters a character + device. This function asserts that the device is not used by any + process anymore (we use the macro `MOD_IN_USE` to check this). + + int unregister_chrdev (unsigned int major, const char* name) + { + if (MOD_IN_USE) + { + ERROR: assert (0); + } + else + return 0; + } + +3. `dummy_open`: (in assets/driver.c) This function increases the + `usecount`. If the device is locked by some other process + `dummy_open` returns -1. Otherwise it locks the device for the + caller. + +4. `dummy_read`: (in assets/driver.c) This function "simulates" a read + access to the device. In fact it does nothing, since we are + currently not interested in the potential buffer overflow that may + result from a call to this function. Note the usage of the function + `nondet_int`: This is an internal SATABS-function that + non­determi­nistically returns an arbitrary integer value. The + function `__CPROVER_assume` tells SATABS to ignore all traces that + do not adhere to the given assumption. Therefore, whenever the lock + is held, `dummy_read` will return a value between 0 and `max`. If + the lock is not held, then `dummy_read` returns -1. + +5. `dummy_release`: (in assets/driver.c) If the lock is held, then + `dummy_release` decreases the `usecount`, releases the lock, and + returns 0. Otherwise, the function returns -1. + +We now want to check if any *valid* sequence of calls of the dispatch +functions (in driver.c) can lead to the violation of the assertion (in +assets/spec.c). Obviously, a call to `dummy_open` that is immediately +followed by a call to `unregister_chrdev` violates the assertion. + +The function `main` in spec.c gives an example of how these functions +are called. First, a character device "`dummy`" is registered. The major +number is stored in the `inode` structure of the device. The values for +the file structure are assigned non-deterministically. We rule out +invalid sequences of calls by ensuring that no device is unregistered +while it is still locked. We use the following model checking harness +for calling the dispatching functions: + +``` + random = nondet_uchar (); + __CPROVER_assume (0 <= random && random <= 3); + + switch (random) + { + case 1: + rval = dummy_open (&inode, &my_file); + if (rval == 0) + lock_held = TRUE; + break; + case 2: + __CPROVER_assume (lock_held); + count = dummy_read (&my_file, buffer, BUF_SIZE); + break; + case 3: + dummy_release (&inode, &my_file); + lock_held = FALSE; + break; + default: + break; + } +``` + +The variable `random` is assigned non-deterministically. Subsequently, +the value of `random` is restricted to be 0 ≤ `random` ≤ 3 by a call +to `__CPROVER_assume`. Whenever the value of `random` is not in this +interval, the corresponding execution trace is simply discarded by +SATABS. Depending on the value of `random`, the harness calls either +`dummy_open`, `dummy_read` or `dummy_close`. Therefore, if there is a +sequence of calls to these three functions that leads to a violation of +the assertion in `unregister_chrdev`, then SATABS will eventually +consider it. + +If we ask SATABS to show us the properties it verifies with + + satabs driver.c spec.c --show-properties + +for our example, we obtain + +1. Claim unregister\_chrdev.1:\ +     file spec.c line 18 function unregister\_chrdev\ +     MOD\_IN\_USE in unregister\_chrdev\ +     FALSE + +2. Claim dummy\_open.1:\ +     file driver.c line 15 function dummy\_open\ +     i\_rdev mismatch\ +     (unsigned int)inode->i\_rdev >> 8 == (unsigned + int)dummy\_major + +It seems obvious that the property dummy\_open.1 can never be violated. +SATABS confirms this assumption: We call + + satabs driver.c spec.c --property dummy_open.1 + +and SATABS reports `VERIFICATION SUCCESSFUL` after a few iterations. + +If we try to verify property unregister\_chrdev.1, SATABS reports that +the property in line 18 in file spec.c is violated (i.e., the assertion +does not hold, therefore the `VERIFICATION FAILED`). Furthermore, SATABS +provides a detailed description of the problem in the form of a +counterexample (i.e., an execution trace that violates the property). On +this trace, `dummy_open` is called **twice**, leading to a `usecount` of 2. +The second call of course fails with `rval=-1`, but the counter is +increased nevertheless: + +``` + int dummy_open (struct inode *inode, struct file *filp) + { + __CPROVER_assert(MAJOR (inode->i_rdev) == dummy_major, + "i_rdev mismatch"); + MOD_INC_USE_COUNT; + + if (locked) + return -1; + locked = TRUE; + + return 0; /* success */ + } + ``` + +Then, `dummy_release` is called to release the lock on the device. +Finally, the loop is left and the call to `unregister_chrdev` results in +a violation of the assertion (since `usecount` is still 1, even though +`locked=0`). + +\subsubsection man_satabs-tutorial-aeon Buffer Overflow in a Mail Transfer Agent + +We explain how to model check Aeon version 0.2a, a small mail transfer +agent written by Piotr Benetkiewicz. The description advertises Aeon as +a "*good choice for **hardened** or minimalistic boxes*". The sources +are available +[here](http://www.cprover.org/satabs/examples/loop_detection/aeon-0.2a.tar.gz). + +Our first naive attempt to verify Aeon using + + satabs *.c + +produces a positive result, but also warns us that the property holds +trivially. It also reveals that a large number library functions are +missing: SATABS is unable to find the source code for library functions +like `send`, `write` and `close`. + +Now, do you have to provide a body for all missing library functions? +There is no easy answer to this question, but a viable answer would be +"most likely not". It is necessary to understand how SATABS handles +functions without bodies: It simply assumes that such a function returns +an arbitrary value, but that no other locations than the one on the left +hand side of the assignment are changed. Obviously, there are cases in +which this assumption is un­sound, since the function potentially +modifies all memory locations that it can somehow address. + +We now use static analysis to generate array bounds checks for Aeon: + + satabs *.c --pointer-check --bounds-check --show-properties + +SATABS will show about 300 properties in various functions (read +\ref man_instrumentation-properties "this" for more information on the +property instrumentation). Now consider the first few lines of the +`main` function of Aeon: + + int main(int argc, char **argv) + { + char settings[MAX_SETTINGS][MAX_LEN]; + ... + numSet = getConfig(settings); + if (numSet == -1) { + logEntry("Missing config file!"); + exit(1); + } + ... + +and the function `getConfig` in `lib_aeon.c`: + + int getConfig(char settings[MAX_SETTINGS][MAX_LEN]) + { + char home[MAX_LEN]; + FILE *fp; /* .rc file handler */ + int numSet = 0; /* number of settings */ + + strcpy(home, getenv("HOME")); /* get home path */ + strcat(home, "/.aeonrc"); /* full path to rc file */ + fp = fopen(home, "r"); + if (fp == NULL) return -1; /* no cfg - ERROR */ + while (fgets(settings[numSet], MAX_LEN-1, fp) + && (numSet < MAX_SETTINGS)) numSet++; + fclose(fp); + + return numSet; + } + +The function `getConfig` makes calls to `strcpy`, `strcat`, `getenv`, +`fopen`, `fgets`, and `fclose`. It is very easy to provide an +implementation for the functions from the string library (string.h), and +SATABS comes with meaningful definitions for most of them. The +definition of `getenv` is not so straight-forward. The man-page of +`getenv` (which we obtain by entering `man 3 getenv` in a Unix or cygwin +command prompt) tells us: + + `` `getenv' `` searches the list of en­vi­ron­ment variable names + and values (using the global pointer `char **environ`) for a + variable whose name matches the string at `NAME`. If a variable name + matches, `getenv` returns a pointer to the associated value. + +SATABS has no information whatsoever about the content of `environ`. +Even if SATABS could access the environment variables on your +computer, a successful verification of `Aeon` would then only guarantee +that the properties for this program hold on your computer with a +specific set of en­vi­ron­ment variables. We have to assume that +`environ` contains en­vi­ron­ment variables that have an arbitrary +content of arbitrary length. The content of en­vi­ron­ment variables is +not only arbitrary but could be malefic, since it can be modified by the +user. The approximation of the behavior of `getenv` that is shipped with +SATABS completely ignores the content of the string. + +Now let us have another look at the properties that SATABS generates for +the models of the string library and for `getenv`. Most of these +properties require that we verify that the upper and lower bounds of +buffers or arrays are not violated. Let us look at one of the properties +that SATABS generates for the code in function `getConfig`: + + Claim getConfig.3:   file lib_aeon.c line 19 function getConfig   dereference failure: NULL plus offset pointer   !(SAME-OBJECT(src, NULL))` + +The model of the function `strcpy` dereferences the pointer returned by +`getenv`, which may return a NULL pointer. This possibility is detected +by the static analysis, and thus a corresponding property is generated. +Let us check this specific property: + + satabs *.c --pointer-check --bounds-check --property getConfig.3 + +SATABS immediately returns a counterexample path that demonstrates how +`getenv` returns a NULL, which is subsequently dereferenced. We have +identified the first bug in this program: it requires that the +environment variable HOME is set, and crashes otherwise. + +Let us examine one more property in the same function: + + Claim getConfig.7:   file lib_aeon.c line 19 function getConfig   dereference failure: array `home' upper bound   !(POINTER_OFFSET(dst) + (int)i >= 512) || !(SAME-OBJECT(dst, &home[0])) + +This property asserts that the upper bound of the array `home` is not +violated. The variable `home` looks familiar: We encountered it in the +function `getConfig` given above. The function `getenv` in combination +with functions `strcpy`, `strcat` or `sprintf` is indeed often the +source for buffer overflows. Therefore, we try to use SATABS to check +the upper bound of the array `home`: + + satabs *.c --pointer-check --bounds-check --property getConfig.7 + +SATABS runs for quite a while and will eventually give up, telling us +that its upper bound for abstraction refinement iterations has been +exceeded. This is not exactly the result we were hoping for, and we +could now increase the bound for iterations with help of the +`--iterations` command line switch of SATABS. + +Before we do this, let us investigate why SATABS has failed to provide a +useful result. The function `strcpy` contains a loop that counts from 1 +to the length of the input string. Predicate abstraction, the mechanism +SATABS is based on, is unable to detect such loops and will therefore +unroll the loop body as often as necessary. The array `home` has +`MAX_LEN` elements, and `MAX_LEN` is defined to be 512 in `aeon.h`. +Therefore, SATABS would have to run through at least 512 iterations, +only to verify (or reject) one of the more than 300 properties! Does +this fact defeat the purpose of static verification? + +We can make the job easier: after reducing the value of `MAX_LEN` in +`aeon.h` to a small value, say to 10, SATABS provides a counterexample +trace that demonstrates how the buffer overflow be reproduced. If you +use the Eclipse plugin (as described \ref man_install-eclipse "here"), +you can step through this counterexample. The trace contains the string +that is returned by `getenv`. From c50f40dba4f7f7d2749afcd54aeb2c07727736e6 Mon Sep 17 00:00:00 2001 From: Peter Schrammel Date: Thu, 22 Mar 2018 19:24:07 +0000 Subject: [PATCH 3/6] Remove old doxygen module directives from header files --- src/ansi-c/ansi_c_language.h | 2 -- src/cpp/cpp_language.h | 2 -- src/util/std_expr.h | 6 +----- src/util/std_types.h | 2 +- src/util/symbol.h | 3 --- src/util/symbol_table_base.h | 2 -- 6 files changed, 2 insertions(+), 15 deletions(-) diff --git a/src/ansi-c/ansi_c_language.h b/src/ansi-c/ansi_c_language.h index 949f266d2c9..b23bdad6469 100644 --- a/src/ansi-c/ansi_c_language.h +++ b/src/ansi-c/ansi_c_language.h @@ -10,8 +10,6 @@ Author: Daniel Kroening, kroening@kroening.com #ifndef CPROVER_ANSI_C_ANSI_C_LANGUAGE_H #define CPROVER_ANSI_C_ANSI_C_LANGUAGE_H -/*! \defgroup gr_ansi_c ANSI-C front-end */ - #include #include diff --git a/src/cpp/cpp_language.h b/src/cpp/cpp_language.h index 613adcbac0f..3f474005f15 100644 --- a/src/cpp/cpp_language.h +++ b/src/cpp/cpp_language.h @@ -12,8 +12,6 @@ Author: Daniel Kroening, kroening@cs.cmu.edu #ifndef CPROVER_CPP_CPP_LANGUAGE_H #define CPROVER_CPP_CPP_LANGUAGE_H -/*! \defgroup gr_cpp C++ front-end */ - #include #include // unique_ptr diff --git a/src/util/std_expr.h b/src/util/std_expr.h index 50d69b3d5fc..75d21715c96 100644 --- a/src/util/std_expr.h +++ b/src/util/std_expr.h @@ -22,10 +22,6 @@ Author: Daniel Kroening, kroening@kroening.com #include "expr_cast.h" -/*! \defgroup gr_std_expr Conversion to specific expressions - * Conversion to subclasses of @ref exprt -*/ - /*! \brief A transition system, consisting of state invariant, initial state predicate, and transition predicate @@ -56,7 +52,7 @@ class transt:public exprt * \param expr Source expression * \return Object of type \ref transt * - * \ingroup gr_std_expr + */ inline const transt &to_trans_expr(const exprt &expr) { diff --git a/src/util/std_types.h b/src/util/std_types.h index a13a334b038..ae323dff374 100644 --- a/src/util/std_types.h +++ b/src/util/std_types.h @@ -25,7 +25,7 @@ Author: Daniel Kroening, kroening@kroening.com class constant_exprt; -/*! \defgroup gr_std_types Conversion to specific types +/*! * Conversion to subclasses of @ref typet */ diff --git a/src/util/symbol.h b/src/util/symbol.h index 06945fc6c1f..ca26306d6bf 100644 --- a/src/util/symbol.h +++ b/src/util/symbol.h @@ -17,9 +17,6 @@ Author: Daniel Kroening, kroening@kroening.com * \date Sun Jul 31 21:54:44 BST 2011 */ -/*! \defgroup gr_symbol_table Symbol Table -*/ - #include #include "expr.h" diff --git a/src/util/symbol_table_base.h b/src/util/symbol_table_base.h index 76bab9a12bb..823883345d6 100644 --- a/src/util/symbol_table_base.h +++ b/src/util/symbol_table_base.h @@ -3,8 +3,6 @@ /// \file /// Symbol table base class interface -/// \defgroup gr_symbol_table Symbol Table - #ifndef CPROVER_UTIL_SYMBOL_TABLE_BASE_H #define CPROVER_UTIL_SYMBOL_TABLE_BASE_H From c22e2e9c880f6d922b31d1de3d2226f4f0786667 Mon Sep 17 00:00:00 2001 From: Peter Schrammel Date: Thu, 22 Mar 2018 19:24:58 +0000 Subject: [PATCH 4/6] Add make doc to Makefile --- src/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/Makefile b/src/Makefile index a10d3d48160..82141a0f317 100644 --- a/src/Makefile +++ b/src/Makefile @@ -107,4 +107,7 @@ ipasir-build: ipasir-download $(MAKE) -C ../ipasir/sat/picosat961 libipasirpicosat961.a @(cd ../ipasir; ln -sf sat/picosat961/libipasirpicosat961.a libipasir.a) +doc : + doxygen + .PHONY: ipasir-build minisat2-download glucose-download cprover-jar-build From 1b7f57dc957fa8d7aa6eae942530b51417ec5870 Mon Sep 17 00:00:00 2001 From: Peter Schrammel Date: Thu, 22 Mar 2018 19:26:17 +0000 Subject: [PATCH 5/6] Use doxygen layout file --- src/DoxygenLayout.xml | 194 ++++++++++++++++++++++++++++++++++++++++++ src/doxyfile | 2 +- 2 files changed, 195 insertions(+), 1 deletion(-) create mode 100644 src/DoxygenLayout.xml diff --git a/src/DoxygenLayout.xml b/src/DoxygenLayout.xml new file mode 100644 index 00000000000..44eaf40f4b9 --- /dev/null +++ b/src/DoxygenLayout.xml @@ -0,0 +1,194 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/doxyfile b/src/doxyfile index 863b3e22f51..1c21026929f 100644 --- a/src/doxyfile +++ b/src/doxyfile @@ -684,7 +684,7 @@ FILE_VERSION_FILTER = # DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE # tag is left empty. -LAYOUT_FILE = +LAYOUT_FILE = DoxygenLayout.xml # The CITE_BIB_FILES tag can be used to specify one or more bib files containing # the reference definitions. This must be a list of .bib files. The .bib From 9aa70a7d4dffbbd045a5a91602e1ad60dfb70384 Mon Sep 17 00:00:00 2001 From: Peter Schrammel Date: Thu, 22 Mar 2018 19:27:20 +0000 Subject: [PATCH 6/6] Add doc README.md files to each directory --- src/analyses/README.md | 10 ++ src/ansi-c/README | 22 ---- src/ansi-c/{module.md => README.md} | 49 ++++++++- src/assembler/README.md | 6 ++ src/big-int/README | 1 - src/big-int/README.md | 14 +++ src/cbmc/README.md | 13 +++ src/cbmc/module.md | 47 --------- src/clobber/README.md | 6 ++ src/cpp/README.md | 17 +++ src/goto-analyzer/README.md | 8 ++ src/goto-cc/README.md | 16 +++ src/goto-diff/README.md | 7 ++ src/goto-instrument/README.md | 28 +++++ src/goto-programs/{module.md => README.md} | 114 ++++++++++++++++++-- src/goto-symex/{module.md => README.md} | 20 +++- src/java_bytecode/README.md | 6 ++ src/jbmc/README.md | 6 ++ src/jsil/README.md | 6 ++ src/json/README.md | 6 ++ src/langapi/README.md | 12 +++ src/linking/README.md | 11 ++ src/memory-models/README.md | 6 ++ src/miniz/README.md | 6 ++ src/nonstd/README.md | 7 ++ src/pointer-analysis/README.md | 12 +++ src/solvers/{refinement => }/README.md | 115 ++++++++++++++++++++- src/solvers/module.md | 62 ----------- src/util/README.md | 95 +++++++++++++++++ src/util/command-line-parsing.md | 16 --- src/xmllang/README | 3 - src/xmllang/README.md | 15 +++ 32 files changed, 598 insertions(+), 164 deletions(-) create mode 100644 src/analyses/README.md delete mode 100644 src/ansi-c/README rename src/ansi-c/{module.md => README.md} (74%) create mode 100644 src/assembler/README.md delete mode 100644 src/big-int/README create mode 100644 src/big-int/README.md create mode 100644 src/cbmc/README.md delete mode 100644 src/cbmc/module.md create mode 100644 src/clobber/README.md create mode 100644 src/cpp/README.md create mode 100644 src/goto-analyzer/README.md create mode 100644 src/goto-cc/README.md create mode 100644 src/goto-diff/README.md create mode 100644 src/goto-instrument/README.md rename src/goto-programs/{module.md => README.md} (59%) rename src/goto-symex/{module.md => README.md} (85%) create mode 100644 src/java_bytecode/README.md create mode 100644 src/jbmc/README.md create mode 100644 src/jsil/README.md create mode 100644 src/json/README.md create mode 100644 src/langapi/README.md create mode 100644 src/linking/README.md create mode 100644 src/memory-models/README.md create mode 100644 src/miniz/README.md create mode 100644 src/nonstd/README.md create mode 100644 src/pointer-analysis/README.md rename src/solvers/{refinement => }/README.md (78%) delete mode 100644 src/solvers/module.md create mode 100644 src/util/README.md delete mode 100644 src/util/command-line-parsing.md delete mode 100644 src/xmllang/README create mode 100644 src/xmllang/README.md diff --git a/src/analyses/README.md b/src/analyses/README.md new file mode 100644 index 00000000000..7d5bb5ef268 --- /dev/null +++ b/src/analyses/README.md @@ -0,0 +1,10 @@ +\ingroup module_hidden +\defgroup analyses analyses + +# Folder analyses + +This contains the abstract interpretation framework `ai.h` and several +static analyses that instantiate it. + +FIXME: put here a good introduction describing what is contained +in this folder. diff --git a/src/ansi-c/README b/src/ansi-c/README deleted file mode 100644 index d4df3170903..00000000000 --- a/src/ansi-c/README +++ /dev/null @@ -1,22 +0,0 @@ -CodeWarrior C Compilers Reference 3.2: - -http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/CCOMPILERRM.pdf - -http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/ASMX86RM.pdf - -ARM 4.1 Compiler Reference: - -http://infocenter.arm.com/help/topic/com.arm.doc.dui0491c/DUI0491C_arm_compiler_reference.pdf - - -Parsing performance considerations: - -* Measured on trunk/regression/ansi-c/windows_h_VS_2012/main.i - -* 13%: Copying into i_preprocessed - -* 5%: ansi_c_parser.read() - -* 53%: yyansi_clex() - -* 29%: parser (without typechecking) diff --git a/src/ansi-c/module.md b/src/ansi-c/README.md similarity index 74% rename from src/ansi-c/module.md rename to src/ansi-c/README.md index fb9d1191109..a1d73685e69 100644 --- a/src/ansi-c/module.md +++ b/src/ansi-c/README.md @@ -1,7 +1,27 @@ +\ingroup module_hidden \defgroup ansi-c ansi-c # Folder ansi-c -\author Kareem Khazem +\author Kareem Khazem, Martin Brain + +\section overview Overview + +Contains the front-end for ANSI C, plus a variety of common extensions. +This parses the file, performs some basic sanity checks (this is one +area in which the UI could be improved; patches most welcome) and then +produces a goto-program (see below). The parser is a traditional Flex / +Bison system. + +`internal_addition.c` contains the implementation of various ‘magic’ +functions that are that allow control of the analysis from the source +code level. These include assertions, assumptions, atomic blocks, memory +fences and rounding modes. + +The `library/` subdirectory contains versions of some of the C standard +header files that make use of the CPROVER built-in functions. This +allows CPROVER programs to be ‘aware’ of the functionality and model it +correctly. Examples include `stdio.c`, `string.c`, `setjmp.c` and +various threading interfaces. \section preprocessing Preprocessing & Parsing @@ -24,8 +44,6 @@ digraph G { \enddot - ---- \section type-checking Type-checking In the \ref ansi-c and \ref java_bytecode directories. @@ -112,3 +130,28 @@ called symbols. Thus, for example: parameter and return types of the function. The value of the symbol is the function's body (a \ref codet), and the symbol is stored in the symbol table with `foo` as the key. + + +\section performance Parsing performance considerations + +* Measured on trunk/regression/ansi-c/windows_h_VS_2012/main.i + +* 13%: Copying into i_preprocessed + +* 5%: ansi_c_parser.read() + +* 53%: yyansi_clex() + +* 29%: parser (without typechecking) + +\section references Compiler References + +CodeWarrior C Compilers Reference 3.2: + +http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/CCOMPILERRM.pdf + +http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/ASMX86RM.pdf + +ARM 4.1 Compiler Reference: + +http://infocenter.arm.com/help/topic/com.arm.doc.dui0491c/DUI0491C_arm_compiler_reference.pdf diff --git a/src/assembler/README.md b/src/assembler/README.md new file mode 100644 index 00000000000..1f5c1415fa5 --- /dev/null +++ b/src/assembler/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup assembler assembler + +# Folder assembler + +`assembler/`provides front-end processing for assembler. diff --git a/src/big-int/README b/src/big-int/README deleted file mode 100644 index 2fa251b46bd..00000000000 --- a/src/big-int/README +++ /dev/null @@ -1 +0,0 @@ -http://www.dirk-zoller.de/ diff --git a/src/big-int/README.md b/src/big-int/README.md new file mode 100644 index 00000000000..0bc17517c76 --- /dev/null +++ b/src/big-int/README.md @@ -0,0 +1,14 @@ +\ingroup module_hidden +\defgroup big-int big-int + +# Folder big-int + +\author Martin Brain + +CPROVER is distributed with its own multi-precision arithmetic library; +mainly for historical and portability reasons. The library is externally +developed and thus `big-int` contains the source as it is distributed: +[http://www.dirk-zoller.de/](http://www.dirk-zoller.de/). + +This should not be used directly, see `util/mp_arith.h` for the CPROVER +interface. diff --git a/src/cbmc/README.md b/src/cbmc/README.md new file mode 100644 index 00000000000..0db55639704 --- /dev/null +++ b/src/cbmc/README.md @@ -0,0 +1,13 @@ +\ingroup module_hidden +\defgroup cbmc cbmc + +# Folder CBMC + +\author Martin Brain + +This contains the first full application. CBMC is a bounded model +checker that uses the front ends (`ansi-c`, `cpp`, goto-program or +others) to create a goto-program, `goto-symex` to unwind the loops the +given number of times and to produce and equation system and finally +`solvers` to find a counter-example (technically, `goto-symex` is then +used to construct the counter-example trace). diff --git a/src/cbmc/module.md b/src/cbmc/module.md deleted file mode 100644 index d2dcb75b4b3..00000000000 --- a/src/cbmc/module.md +++ /dev/null @@ -1,47 +0,0 @@ -\ingroup module_hidden -\defgroup module_cbmc CBMC tour - -\author Kareem Khazem - -CBMC takes C code or a goto-binary as input and tries to emit traces of -executions that lead to crashes or undefined behaviour. The diagram -below shows the intermediate steps in this process. - - -\dot -digraph G { - - rankdir="TB"; - node [shape=box, fontcolor=blue]; - - subgraph top { - rank=same; - 1 -> 2 -> 3 -> 4; - } - - subgraph bottom { - rank=same; - 5 -> 6 -> 7 -> 8 -> 9; - } - - /* shift bottom subgraph over */ - 9 -> 1 [color=white]; - - 4 -> 5; - - 1 [label="command line\nparsing" URL="\ref cbmc_parse_optionst"]; - 2 [label="preprocessing,\nparsing" URL="\ref preprocessing"]; - 3 [label="language\ntype-checking" URL="\ref type-checking"]; - 4 [label="goto\nconversion" URL="\ref goto-conversion"]; - 5 [label="instrumentation" URL="\ref instrumentation"]; - 6 [label="symbolic\nexecution" URL="\ref symbolic-execution"]; - 7 [label="SAT/SMT\nencoding" URL="\ref sat-smt-encoding"]; - 8 [label="decision\nprocedure" URL="\ref decision-procedure"]; - 9 [label="counter example\nproduction" URL="\ref counter-example-production"]; -} -\enddot - -The \ref cprover-manual "CProver Manual" describes CBMC from a user -perspective. Each node in the diagram above links to the appropriate -class or module documentation, describing that particular stage in the -CBMC pipeline. diff --git a/src/clobber/README.md b/src/clobber/README.md new file mode 100644 index 00000000000..fef0bd93928 --- /dev/null +++ b/src/clobber/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup clobber clobber + +# Folder clobber + +`clobber\` is a module that is a tool. diff --git a/src/cpp/README.md b/src/cpp/README.md new file mode 100644 index 00000000000..f3a4ceda724 --- /dev/null +++ b/src/cpp/README.md @@ -0,0 +1,17 @@ +\ingroup module_hidden +\defgroup cpp cpp + +# Folder cpp + +\author Martin Brain + +This directory contains the C++ front-end. It supports the subset of C++ +commonly found in embedded and system applications. Consequentially it +doesn’t have full support for templates and many of the more advanced +and obscure C++ features. The subset of the language that can be handled +is being extended over time so bug reports of programs that cannot be +parsed are useful. + +The functionality is very similar to the ANSI C front end; parsing the +code and converting to goto-programs. It makes use of code from +`langapi` and `ansi-c`. diff --git a/src/goto-analyzer/README.md b/src/goto-analyzer/README.md new file mode 100644 index 00000000000..0449a4a7b0e --- /dev/null +++ b/src/goto-analyzer/README.md @@ -0,0 +1,8 @@ +\ingroup module_hidden +\defgroup goto-analyzer goto-analyzer + +# Folder goto-analyzer + +`goto-analyzer/` is a tool performing static analyses on goto +programs. It provides the front end for many of the static analyses +in the \ref analyses directory. diff --git a/src/goto-cc/README.md b/src/goto-cc/README.md new file mode 100644 index 00000000000..98dbc67e6ad --- /dev/null +++ b/src/goto-cc/README.md @@ -0,0 +1,16 @@ +\ingroup module_hidden +\defgroup goto-cc goto-cc + +# Folder goto-cc + +\author Martin Brain + +`goto-cc` is a compiler replacement that just performs the first step of +the process; converting C or C++ programs to goto-binaries. It is +intended to be dropped in to an existing build procedure in place of the +compiler, thus it emulates flags that would affect the semantics of the +code produced. Which set of flags are emulated depends on the naming of +the `goto-cc/` binary. If it is called `goto-cc` then it emulates GCC +flags, `goto-armcc` emulates the ARM compiler, `goto-cl` emulates VCC +and `goto-cw` emulates the Code Warrior compiler. The output of this +tool can then be used with `cbmc` or `goto-instrument`. diff --git a/src/goto-diff/README.md b/src/goto-diff/README.md new file mode 100644 index 00000000000..d49350f4f6a --- /dev/null +++ b/src/goto-diff/README.md @@ -0,0 +1,7 @@ +\ingroup module_hidden +\defgroup goto-diff goto-diff + +# Folder goto-diff + +`goto-diff/` is a tool that offers functionality similar to the `diff` +tool, but for GOTO programs. diff --git a/src/goto-instrument/README.md b/src/goto-instrument/README.md new file mode 100644 index 00000000000..5be917235ba --- /dev/null +++ b/src/goto-instrument/README.md @@ -0,0 +1,28 @@ +\ingroup module_hidden +\defgroup goto-instrument goto-instrument + +# Folder goto-instrument + +\author Martin Brain + +The `goto-instrument/` directory contains a number of tools, one per +file, that are built into the `goto-instrument` program. All of them +take in a goto-program (produced by `goto-cc`) and either modify it or +perform some analysis. Examples include `nondet_static.cpp` which +initialises static variables to a non-deterministic value, +`nondet_volatile.cpp` which assigns a non-deterministic value to any +volatile variable before it is read and `weak_memory.h` which performs +the necessary transformations to reason about weak memory models. The +exception to the “one file for each piece of functionality” rule are the +program instrumentation options (mostly those given as “Safety checks” +in the `goto-instrument` help text) which are included in the +`goto-program/` directory. An example of this is +`goto-program/stack_depth.h` and the general rule seems to be that +transformations and instrumentation that `cbmc` uses should be in +`goto-program/`, others should be in `goto-instrument`. + +`goto-instrument` is a very good template for new analysis tools. New +developers are advised to copy the directory, remove all files apart +from `main.*`, `parseoptions.*` and the `Makefile` and use these as the +skeleton of their application. The `doit()` method in `parseoptions.cpp` +is the preferred location for the top level control for the program. diff --git a/src/goto-programs/module.md b/src/goto-programs/README.md similarity index 59% rename from src/goto-programs/module.md rename to src/goto-programs/README.md index d7696cd6ad6..4bccc9de726 100644 --- a/src/goto-programs/module.md +++ b/src/goto-programs/README.md @@ -1,7 +1,107 @@ +\ingroup module_hidden \defgroup goto-programs goto-programs + # Folder goto-programs -\author Kareem Khazem +\author Kareem Khazem, Martin Brain + +\section overview Overview +Goto programs are the intermediate representation of the CPROVER tool +chain. They are language independent and similar to many of the compiler +intermediate languages. Section \ref goto-programs describes the +`goto_programt` and `goto_functionst` data structures in detail. However +it useful to understand some of the basic concepts. Each function is a +list of instructions, each of which has a type (one of 18 kinds of +instruction), a code expression, a guard expression and potentially some +targets for the next instruction. They are not natively in static +single-assign (SSA) form. Transitions are nondeterministic (although in +practise the guards on the transitions normally cover form a disjoint +cover of all possibilities). Local variables have non-deterministic +values if they are not initialised. Variables and data within the +program is commonly one of three types (parameterised by width): +`unsignedbv_typet`, `signedbv_typet` and `floatbv_typet`, see +`util/std_types.h` for more information. Goto programs can be serialised +in a binary (wrapped in ELF headers) format or in XML (see the various +`_serialization` files). + +The `cbmc` option `–show-goto-programs` is often a good starting point +as it outputs goto-programs in a human readable form. However there are +a few things to be aware of. Functions have an internal name (for +example `c::f00`) and a ‘pretty name’ (for example `f00`) and which is +used depends on whether it is internal or being presented to the user. +The `main` method is the ‘logical’ main which is not necessarily the +main method from the code. In the output `NONDET` is use to represent a +nondeterministic assignment to a variable. Likewise `IF` as a beautified +`GOTO` instruction where the guard expression is used as the condition. +`RETURN` instructions may be dropped if they precede an `END_FUNCTION` +instruction. The comment lines are generated from the `locationt` field +of the `instructiont` structure. + +`goto-programs/` is one of the few places in the CPROVER codebase that +templates are used. The intention is to allow the general architecture +of program and functions to be used for other formalisms. At the moment +most of the templates have a single instantiation; for example +`goto_functionst` and `goto_function_templatet` and `goto_programt` and +`goto_program_templatet`. + +\section data_structures Data Structures + +FIXME: This text is partially outdated. + +The common starting point for working with goto-programs is the +`read_goto_binary` function which populates an object of +`goto_functionst` type. This is defined in `goto_functions.h` and is an +instantiation of the template `goto_functions_templatet` which is +contained in `goto_functions_template.h`. They are wrappers around a map +from strings to `goto_programt`’s and iteration macros are provided. +Note that `goto_function_templatet` (no `s`) is defined in the same +header as `goto_functions_templatet` and is gives the C type for the +function and Boolean which indicates whether the body is available +(before linking this might not always be true). Also note the slightly +counter-intuitive naming; `goto_functionst` instances are the top level +structure representing the program and contain `goto_programt` instances +which represent the individual functions. At the time of writing +`goto_functionst` is the only instantiation of the template +`goto_functions_templatet` but other could be produced if a different +data-structures / kinds of models were needed for functions. + +`goto_programt` is also an instantiation of a template. In a similar +fashion it is `goto_program_templatet` and allows the types of the guard +and expression used in instructions to be parameterised. Again, this is +currently the only use of the template. As such there are only really +helper functions in `goto_program.h` and thus `goto_program_template.h` +is probably the key file that describes the representation of (C) +functions in the goto-program format. It is reasonably stable and +reasonably documented and thus is a good place to start looking at the +code. + +An instance of `goto_program_templatet` is effectively a list of +instructions (and inner template called `instructiont`). It is important +to use the copy and insertion functions that are provided as iterators +are used to link instructions to their predecessors and targets and +careless manipulation of the list could break these. Likewise there are +helper macros for iterating over the instructions in an instance of +`goto_program_templatet` and the use of these is good style and strongly +encouraged. + +Individual instructions are instances of type `instructiont`. They +represent one step in the function. Each has a type, an instance of +`goto_program_instruction_typet` which denotes what kind of instruction +it is. They can be computational (such as `ASSIGN` or `FUNCTION_CALL`), +logical (such as `ASSUME` and `ASSERT`) or informational (such as +`LOCATION` and `DEAD`). At the time of writing there are 18 possible +values for `goto_program_instruction_typet` / kinds of instruction. +Instructions also have a guard field (the condition under which it is +executed) and a code field (what the instruction does). These may be +empty depending on the kind of instruction. In the default +instantiations these are of type `exprt` and `codet` respectively and +thus covered by the previous discussion of `irept` and its descendents. +The next instructions (remembering that transitions are guarded by +non-deterministic) are given by the list `targets` (with the +corresponding list of labels `labels`) and the corresponding set of +previous instructions is get by `incoming_edges`. Finally `instructiont` +have informational `function` and `location` fields that indicate where +they are in the code. \section goto-conversion Goto Conversion @@ -10,7 +110,7 @@ In the \ref goto-programs directory. **Key classes:** * goto_programt * goto_functionst -* \ref goto_program_templatet::instructiont +* \ref goto_programt::instructiont \dot digraph G { @@ -36,7 +136,7 @@ entire target program is converted to a single \ref goto_functionst object. The goto functions contains a set of \ref goto_programt objects; each of these correspond to a "function" or "method" in the target program. Each goto_program contains a list of -\ref goto_program_templatet::instructiont "instructions"; each +\ref goto_programt::instructiont "instructions"; each instruction contains an associated statement---these are subtypes of \ref codet. Each instruction also contains a "target", which will be empty for now. @@ -188,7 +288,7 @@ goto_program_templatet and goto_functions_templatet, respectively. This means that the generated Doxygen documentation can be somewhat obtuse about the actual types of things, and is unable to generate links to the correct classes. Note that the -\ref goto_program_templatet::instructiont::code "code" member of a +\ref goto_programt::instructiont::code "code" member of a goto_programt's instruction has type \ref codet (its type in the goto_program_templatet documentation is given as "codeT", as this is the name of the template's type parameter); similarly, the type of a guard @@ -202,7 +302,7 @@ In the \ref goto-programs directory. **Key classes:** * goto_programt * goto_functionst -* \ref goto_program_templatet::instructiont +* \ref goto_programt::instructiont \dot digraph G { @@ -236,9 +336,9 @@ previous stage: the conditionals are flattened into lists of instructions, inline with the rest of the instruction in the goto_programt. The guard of the conditional is placed onto the - \ref goto_program_templatet::instructiont::guard "guard" member of + \ref goto_programt::instructiont::guard "guard" member of an instruction whose code member is of type \ref code_gotot. The - \ref goto_program_templatet::instructiont::targets "targets" member + \ref goto_programt::instructiont::targets "targets" member of that instruction points to the appropriate branch of the conditional. (Note that although instructions have a list of targets, in practice an instruction should only ever have at most diff --git a/src/goto-symex/module.md b/src/goto-symex/README.md similarity index 85% rename from src/goto-symex/module.md rename to src/goto-symex/README.md index fa34628d82d..5617dc04eed 100644 --- a/src/goto-symex/module.md +++ b/src/goto-symex/README.md @@ -1,7 +1,25 @@ +\ingroup module_hidden \defgroup goto-symex goto-symex + # Folder goto-symex -\author Kareem Khazem +\author Kareem Khazem, Martin Brain + +This directory contains a symbolic evaluation system for goto-programs. +This takes a goto-program and translates it to an equation system by +traversing the program, branching and merging and unwinding loops as +needed. Each reverse goto has a separate counter (the actual counting is +handled by `cbmc`, see the `–unwind` and `–unwind-set` options). When a +counter limit is reach, an assertion can be added to explicitly show +when analysis is incomplete. The symbolic execution includes constant +folding so loops that have a constant number of iterations will be +handled completely (assuming the unwinding limit is sufficient). + +The output of the symbolic execution is a system of equations; an object +containing a list of `symex_target_elements`, each of which are +equalities between `expr` expressions. See `symex_target_equation.h`. +The output is in static, single assignment (SSA) form, which is *not* +the case for goto-programs. \section symbolic-execution Symbolic Execution diff --git a/src/java_bytecode/README.md b/src/java_bytecode/README.md new file mode 100644 index 00000000000..7b8812aa0b3 --- /dev/null +++ b/src/java_bytecode/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup java_bytecode java_bytecode + +# Folder java_bytecode + +This module providesy a front end for Java. diff --git a/src/jbmc/README.md b/src/jbmc/README.md new file mode 100644 index 00000000000..3fa183e7976 --- /dev/null +++ b/src/jbmc/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup jbmc jbmc + +# Folder jbmc + +`jbmc/` is like cbmc but especially designed for Java. diff --git a/src/jsil/README.md b/src/jsil/README.md new file mode 100644 index 00000000000..3f77753cbfc --- /dev/null +++ b/src/jsil/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup jsil jsil + +# Folder jsil + +`jsil/` contains a JavaScript front end. diff --git a/src/json/README.md b/src/json/README.md new file mode 100644 index 00000000000..dbd4ffa978f --- /dev/null +++ b/src/json/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup json json + +# Folder json + +`json/` contains a JSON parser. diff --git a/src/langapi/README.md b/src/langapi/README.md new file mode 100644 index 00000000000..b39599ea20f --- /dev/null +++ b/src/langapi/README.md @@ -0,0 +1,12 @@ +\ingroup module_hidden +\defgroup langapi langapi + +# Folder langapi + +\author Martin Brain + +`langapi/` contains the basic interfaces and support classes for programming +language front ends. Developers only really need look at this if they +are adding support for a new language. It’s main users are the +language front-ends such as `ansi-c/` and +`cpp/`. diff --git a/src/linking/README.md b/src/linking/README.md new file mode 100644 index 00000000000..28c68ae21e6 --- /dev/null +++ b/src/linking/README.md @@ -0,0 +1,11 @@ +\ingroup module_hidden +\defgroup linking linking + +# Folder linking + +\author Martin Brain + +This allows multiple ‘object +files’ (goto-programs) to be linked into one ‘executable’ (another +goto-program), thus allowing existing build systems to be used to build +complete goto-program binaries. diff --git a/src/memory-models/README.md b/src/memory-models/README.md new file mode 100644 index 00000000000..5eadb681177 --- /dev/null +++ b/src/memory-models/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup memory-models memory-models + +# Folder memory-models + +`memory-models` contains tools related to weak memory models. diff --git a/src/miniz/README.md b/src/miniz/README.md new file mode 100644 index 00000000000..0224ec77730 --- /dev/null +++ b/src/miniz/README.md @@ -0,0 +1,6 @@ +\ingroup module_hidden +\defgroup miniz miniz + +# Folder miniz + +`miniz/` contains a minimal ZIP compression library. diff --git a/src/nonstd/README.md b/src/nonstd/README.md new file mode 100644 index 00000000000..aebbaabb6ff --- /dev/null +++ b/src/nonstd/README.md @@ -0,0 +1,7 @@ +\ingroup module_hidden +\defgroup nonstd nonstd + +# Folder nonstd + +`nonstd` contains implementations of C++ utilities that are not yet +part of the standard library, e.g. for `optional`. diff --git a/src/pointer-analysis/README.md b/src/pointer-analysis/README.md new file mode 100644 index 00000000000..0ef8b89f44b --- /dev/null +++ b/src/pointer-analysis/README.md @@ -0,0 +1,12 @@ +\ingroup module_hidden +\defgroup pointer-analysis pointer-analysis + +# Folder pointer-analysis + +\author Martin Brain + +To perform symbolic execution on programs with dereferencing of +arbitrary pointers, some alias analysis is needed. `pointer-analysis` +contains the three levels of analysis; flow and context insensitive, +context sensitive and flow and context sensitive. The code needed is +subtle and sophisticated and thus there may be bugs. diff --git a/src/solvers/refinement/README.md b/src/solvers/README.md similarity index 78% rename from src/solvers/refinement/README.md rename to src/solvers/README.md index f0ec6efde68..7a45f246662 100644 --- a/src/solvers/refinement/README.md +++ b/src/solvers/README.md @@ -1,8 +1,111 @@ +\ingroup module_hidden \defgroup solvers solvers # Folder solvers -\defgroup string_solver_interface String solver interface -\author Romain Brenguier +\authors Romain Brenguier, Kareem Khazem, Martin Brain + +\section overview Overview + +The basic role of solvers is to answer whether the set of equations given +is satisfiable. +One example usage, is to determine whether an assertion in a +program can be violated. +We refer to \ref goto-symex for how CBMC and JBMC convert a input program +and property to a set of equations. + +The secondary role of solvers is to provide a satisfying assignment of +the variables of the equations, this can for instance be used to construct +a trace. + +The `solvers/` directory contains interfaces to a number of +different decision procedures, roughly one per directory. + +* prop/: The basic and common functionality. The key file is + `prop_conv.h` which defines `prop_convt`. This is the base class that + is used to interface to the decision procedures. The key functions are + `convert` which takes an `exprt` and converts it to the appropriate, + solver specific, data structures and `dec_solve` (inherited from + `decision_proceduret`) which invokes the actual decision procedures. + Individual decision procedures (named `*_dect`) objects can be created + but `prop_convt` is the preferred interface for code that uses them. + +* flattening/: A library that converts operations to bit-vectors, + including calling the conversions in `floatbv` as necessary. Is + implemented as a simple conversion (with caching) and then a + post-processing function that adds extra constraints. This is not used + by the SMT or CVC back-ends. + +* dplib/: Provides the `dplib_dect` object which used the decision + procedure library from “Decision Procedures : An Algorithmic Point of + View”. + +* cvc/: Provides the `cvc_dect` type which interfaces to the old (pre + SMTLib) input format for the CVC family of solvers. This format is + still supported by depreciated in favour of SMTLib 2. + +* smt1/: Provides the `smt1_dect` type which converts the formulae to + SMTLib version 1 and then invokes one of Boolector, CVC3, OpenSMT, + Yices, MathSAT or Z3. Again, note that this format is depreciated. + +* smt2/: Provides the `smt2_dect` type which functions in a similar + way to `smt1_dect`, calling Boolector, CVC3, MathSAT, Yices or Z3. + Note that the interaction with the solver is batched and uses + temporary files rather than using the interactive command supported by + SMTLib 2. With the `–fpa` option, this output mode will not flatten + the floating point arithmetic and instead output the proposed SMTLib + floating point standard. + +* qbf/: Back-ends for a variety of QBF solvers. Appears to be no + longer used or maintained. + +* sat/: Back-ends for a variety of SAT solvers and DIMACS output. + +\section sat-smt-encoding SAT/SMT Encoding + +In the \ref solvers directory. + +**Key classes:** +* \ref literalt +* \ref boolbvt +* \ref propt + +\dot +digraph G { + node [shape=box]; + rankdir="LR"; + 1 [shape=none, label=""]; + 2 [label="goto conversion"]; + 3 [shape=none, label=""]; + 1 -> 2 [label="equations"]; + 2 -> 3 [label="propositional variables as bitvectors, constraints"]; +} +\enddot + + +--- + +\section decision-procedure Decision Procedure + +In the \ref solvers directory. + +**Key classes:** +* symex_target_equationt +* \ref propt +* \ref bmct + +\dot +digraph G { + node [shape=box]; + rankdir="LR"; + 1 [shape=none, label=""]; + 2 [label="goto conversion"]; + 3 [shape=none, label=""]; + 1 -> 2 [label="propositional variables as bitvectors, constraints"]; + 2 -> 3 [label="solutions"]; +} +\enddot + +\section string-solver-interface String Solver Interface The string solver is particularly aimed at string logic, but since it inherits from \ref bv_refinementt it is also capable of handling arithmetic, array logic, @@ -241,3 +344,11 @@ This is done by generate_instantiations(messaget::mstreamt &stream, const namesp \copydetails check_axioms(const string_axiomst &axioms, string_constraint_generatort &generator, const std::function &get, messaget::mstreamt &stream, const namespacet &ns, std::size_t max_string_length, bool use_counter_example, ui_message_handlert::uit ui, const union_find_replacet &symbol_resolve) \link check_axioms(const string_axiomst &axioms, string_constraint_generatort &generator, const std::function &get, messaget::mstreamt &stream, const namespacet &ns, std::size_t max_string_length, bool use_counter_example, ui_message_handlert::uit ui, const union_find_replacet &symbol_resolve) (See function documentation...) \endlink + +\section floatbv Floatbv Directory + +This library contains the code that is used to convert floating point +variables (`floatbv`) to bit vectors (`bv`). This is referred to as +‘bit-blasting’ and is called in the `solver` code during conversion to +SAT or SMT. It also contains the abstraction code described in the +FMCAD09 paper. diff --git a/src/solvers/module.md b/src/solvers/module.md deleted file mode 100644 index 0369735e555..00000000000 --- a/src/solvers/module.md +++ /dev/null @@ -1,62 +0,0 @@ -\ingroup module_hidden -\defgroup module_solvers SAT/SMT Encoding and Decision Procedure - -\author Kareem Khazem - -The basic role of solvers is to answer whether the set of equations given -is satisfiable. -One example usage, is to determine whether an assertion in a -program can be violated. -We refer to \ref module_goto-symex for how CBMC and JBMC convert a input program -and property to a set of equations. - -The secondary role of solvers is to provide a satisfying assignment of -the variables of the equations, this can for instance be used to construct -a trace. - -The most general solver in terms of supported equations is \ref string-solver. - -\section sat-smt-encoding SAT/SMT Encoding - -In the \ref solvers directory. - -**Key classes:** -* \ref literalt -* \ref boolbvt -* \ref propt - -\dot -digraph G { - node [shape=box]; - rankdir="LR"; - 1 [shape=none, label=""]; - 2 [label="goto conversion"]; - 3 [shape=none, label=""]; - 1 -> 2 [label="equations"]; - 2 -> 3 [label="propositional variables as bitvectors, constraints"]; -} -\enddot - - ---- - -\section decision-procedure Decision Procedure - -In the \ref solvers directory. - -**Key classes:** -* symex_target_equationt -* \ref propt -* \ref bmct - -\dot -digraph G { - node [shape=box]; - rankdir="LR"; - 1 [shape=none, label=""]; - 2 [label="goto conversion"]; - 3 [shape=none, label=""]; - 1 -> 2 [label="propositional variables as bitvectors, constraints"]; - 2 -> 3 [label="solutions"]; -} -\enddot diff --git a/src/util/README.md b/src/util/README.md new file mode 100644 index 00000000000..72b0c3a93e2 --- /dev/null +++ b/src/util/README.md @@ -0,0 +1,95 @@ +\ingroup module_hidden +\defgroup util util + +# Folder util + +\author Martin Brain + +\section data_structures Data Structures + +This section discusses some of the key data-structures used in the +CPROVER codebase. + +\subsection irept Irept Data Structure + +There are a large number of kind of tree structured or tree-like data in +CPROVER. `irept` provides a single, unified representation for all of +these, allowing structure sharing and reference counting of data. As +such `irept` is the basic unit of data in CPROVER. Each `irept` +contains[^2] a basic unit of data (of type `dt`) which contains four +things: + +* `data`: A string[^3], which is returned when the `id()` function is + used. + +* `named_sub`: A map from `irep_namet` (a string) to an `irept`. This + is used for named children, i.e. subexpressions, parameters, etc. + +* `comments`: Another map from `irep_namet` to `irept` which is used + for annotations and other ‘non-semantic’ information + +* `sub`: A vector of `irept` which is used to store ordered but + unnamed children. + +The `irept::pretty` function outputs the contents of an `irept` directly +and can be used to understand an debug problems with `irept`s. + +On their own `irept`s do not “mean” anything; they are effectively +generic tree nodes. Their interpretation depends on the contents of +result of the `id` function (the `data`) field. `util/irep_ids.txt` +contains the complete list of `id` values. During the build process it +is used to generate `util/irep_ids.h` which gives constants for each id +(named `ID_`). These can then be used to identify what kind of data +`irept` stores and thus what can be done with it. + +To simplify this process, there are a variety of classes that inherit +from `irept`, roughly corresponding to the ids listed (i.e. `ID_or` +(the string `"or”`) corresponds to the class `or_exprt`). These give +semantically relevant accessor functions for the data; effectively +different APIs for the same underlying data structure. None of these +classes add fields (only methods) and so static casting can be used. The +inheritance graph of the subclasses of `irept` is a useful starting +point for working out how to manipulate data. + +There are three main groups of classes (or APIs); those derived from +`typet`, `codet` and `exprt` respectively. Although all of these inherit +from `irept`, these are the most abstract level that code should handle +data. If code is manipulating plain `irept`s then something is wrong +with the architecture of the code. + +Many of the key descendent of `exprt` are declared in `std_expr.h`. All +expressions have a named subfield / annotation which gives the type of +the expression (slightly simplified from C/C++ as `unsignedbv_typet`, +`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are +explicit with an expression with `id() == ID_typecast` and an ‘interface +class’ named `typecast_exprt`. One key descendent of `exprt` is +`symbol_exprt` which creates `irept` instances with the id of “symbol”. +These are used to represent variables; the name of which can be found +using the `get_identifier` accessor function. + +`codet` inherits from `exprt` and is defined in `std_code.h`. They +represent executable code; statements in C rather than expressions. In +the front-end there are versions of these that hold whole code blocks, +but in goto-programs these have been flattened so that each `irept` +represents one sequence point (almost one line of code / one +semi-colon). The most common descendents of `codet` are `code_assignt` +so a common pattern is to cast the `codet` to an assignment and then +recurse on the expression on either side. + +[^2]: Or references, if reference counted data sharing is enabled. It is + enabled by default; see the `SHARING` macro. + +[^3]: Unless `USE_STD_STRING` is set, this is actually +a `dstring` and thus an integer which is a reference into a string table + +\dot +digraph G { + node [shape=box]; + rankdir="LR"; + 1 [shape=none, label=""]; + 2 [label="command line parsing"]; + 3 [shape=none, label=""]; + 1 -> 2 [label="C files or goto-binaries"]; + 2 -> 3 [label="Command line options, file names"]; +} +\enddot diff --git a/src/util/command-line-parsing.md b/src/util/command-line-parsing.md deleted file mode 100644 index 611c3c4203f..00000000000 --- a/src/util/command-line-parsing.md +++ /dev/null @@ -1,16 +0,0 @@ -\defgroup util util -# Folder util - -`util/` is filled with useful tools. - -\dot -digraph G { - node [shape=box]; - rankdir="LR"; - 1 [shape=none, label=""]; - 2 [label="command line parsing"]; - 3 [shape=none, label=""]; - 1 -> 2 [label="C files or goto-binaries"]; - 2 -> 3 [label="Command line options, file names"]; -} -\enddot diff --git a/src/xmllang/README b/src/xmllang/README deleted file mode 100644 index 9150bffa566..00000000000 --- a/src/xmllang/README +++ /dev/null @@ -1,3 +0,0 @@ -based on grammar/tokenizer from http://www.w3.org/XML/9707/xml-in-c.tar.gz -also see http://www.w3.org/XML/9707/XML-in-C - diff --git a/src/xmllang/README.md b/src/xmllang/README.md new file mode 100644 index 00000000000..94baab33df6 --- /dev/null +++ b/src/xmllang/README.md @@ -0,0 +1,15 @@ +\ingroup module_hidden +\defgroup xmllang xmllang + +# Folder xmllang + +\author Martin Brain + +CPROVER has optional XML output for results and there is an XML format +for goto-programs. It is used to interface to various IDEs. The +`xmllang/` directory contains the parser and helper functions for +handling this format. + +Based on grammar/tokenizer from [http://www.w3.org/XML/9707/xml-in-c.tar.gz](http://www.w3.org/XML/9707/xml-in-c.tar.gz) +also see +[http://www.w3.org/XML/9707/XML-in-C](http://www.w3.org/XML/9707/XML-in-C).