Verification
of Executables
Field: machine code analysis, concolic execution, IR recovery
Applications: COTS, mobile code, malware, legacy code
Context
Verification is usually performed on a high-level view of the
software,
either specification or program source code. However in certain
circumstances verification is more relevant when performed
at the machine code level, either the source code is not available
(COTS, mobile code, malware) or the compiler chain cannot be trusted
(e.g. strong safety/security requirements and optimising compilers).
However verification of native code programs is much more difficult
than verification of higher-level formalisms, for at least three
reasons: low-level semantics on data, low-level semantics on control
and IR recovery (see publications for more details).
The OSMOSE prototype
We develop a prototype named OSMOSE for (semi) automatic analysis of
native code programs. The tool takes as input an executable file, an
architecture/ISA id and a modelling of the external environment (for
embedded systems). Output are a high-level description of the
executable behaviour (control-flow graph, call graph, etc.) and a test
suite achieving some predefined coverage objectives.
We rely on four major technologies:
- A concolic execution engine, modified for native code
- A bit-vector constraint solver based on Constraint Programming
- A novel IR recovery technology mixing concolic execution and static analysis
- A Generic Assembly Language (GAL) parametrize by a Template
Architecture Model (TAM), to add easily new architectures/ISA in the
tool
Results
- The prototype OSMOSE as well as the underlying theory and algorithms have been presented at ICST 2008 [ICST-08]
- An
extended version of the OSMOSE presentation (including the GAL
language) has been accepted in STVR [STVR-10]
- Various heuristics to prune irrelevant paths in path-based test
generation have been presented at ICST 2009 [ICST-09]
- A paper describing our CLP solver for bit-vector theory has been accepted to TACAS 2010 (benchs)
[TACAS-10]
- A refinement-based technique for safe and precise CFG [VMCAI-11]
- The BINCOA framework for binary code analysers: a formal model (technical report), a xml DTD for that model, an example and some OCaml source code for DBA manipulation [CAV-11] (DBA Toolkit)
A few slides
Back