Research: Verification of executables

Verification of Executables

Field: machine code analysis, concolic execution, IR recovery
Applications: COTS, mobile code, malware, legacy code

Context

Verification is usually performed on a high-level view of the software, either specification or program source code. However in certain circumstances verification is more relevant when performed at the machine code level, either the source code is not available (COTS, mobile code, malware) or the compiler chain cannot be trusted (e.g. strong safety/security requirements and optimising compilers). However verification of native code programs is much more difficult than verification of higher-level formalisms, for at least three reasons: low-level semantics on data, low-level semantics on control and IR recovery (see publications for more details).

The OSMOSE prototype

We develop a prototype named OSMOSE for (semi) automatic analysis of native code programs. The tool takes as input an executable file, an architecture/ISA id and a modelling of the external environment (for embedded systems). Output are a high-level description of the executable behaviour (control-flow graph, call graph, etc.) and a test suite achieving some predefined coverage objectives.

We rely on four major technologies:

A concolic execution engine, modified for native code
A bit-vector constraint solver based on Constraint Programming
A novel IR recovery technology mixing concolic execution and static analysis
A Generic Assembly Language (GAL) parametrize by a Template Architecture Model (TAM), to add easily new architectures/ISA in the tool

Results

The prototype OSMOSE as well as the underlying theory and algorithms have been presented at ICST 2008 [ICST-08]
An extended version of the OSMOSE presentation (including the GAL language) has been accepted in STVR [STVR-10]
Various heuristics to prune irrelevant paths in path-based test generation have been presented at ICST 2009 [ICST-09]
A paper describing our CLP solver for bit-vector theory has been accepted to TACAS 2010 (benchs) [TACAS-10]
A refinement-based technique for safe and precise CFG [VMCAI-11]
The BINCOA framework for binary code analysers: a formal model (technical report), a xml DTD for that model, an example and some OCaml source code for DBA manipulation [CAV-11] (DBA Toolkit)

A few slides

OSMOSE presentation. ICST 2008. April 2008.
About OSMOSE and concolic test generation (in French). AFADL 2009. January 2009.
Pruning the search space in symbolic execution. ICST 2009. April 2009.
An Alternative to SAT-based Approaches for Bit-Vectors. TACAS 2010. March 2010.
Safe (and Precise) CFG Reconstruction from Unstructured Programs: slides of VMCAI 2011, slides of the GDR GPL meeting (French software engineering network)
The BINCOA framework. CAV 2011. July 2011.

Back