Discovering instructions for robust binary-level coverage criteria
Vaibhav Sharma, Taejoon Byun, Stephen McCamant, Sanjai Rayadurgam, Mats P. E. Heimdahl
- Read: 02 May 2025
- Published: 13 Jul 2017
TECPS 2017: Proceedings of the 1st ACM SIGSOFT International Workshop on Testing Embedded and Cyber-Physical Systems, Pages 1 - 4
https://doi.org/10.1145/3107091.3107092
See also: Toward Rigorous Object-Code Coverage Criteria
Q&A (link)
What are the motivations for this work?
- See Abstract, Introduction.
- Traditional OBC (Object-Branch Coverage) definition can be made more resilient to variations in compilers and the structure of generated code by creating more robust definitions. But the process is laborious, error-prone, and architecture-dependent.
- Want automated discovery of instructions to be included for an improved OBC definition on the X86 and ARM arch.
- An example below shows that traditional OBC may fail to catch branches due to the miss of conditional jump instructions introduced by compilation optimization (in the instructions generated by O1
cmoveis a conditional move).

What is the proposed solution?
- See Abstract, Section 2.
- Discover all possible valide instructions by symbolically executing instruction decoders for X86, ARM.
- For each discovered instruction, translate it to Vine IR, and check if the Vine IR translation satisfies the OBC definition.
- Enumeration and classification with 3 steps.
- Obtain a list of instruction byte sequences for every valid X86 and ARM instruction.
- Perform automated classification of the behavior of every instruction byte sequence as exhibiting conditional behavior or not, record the source of conditional behavior.
- Verify the classification output against the Intel arch manual, and the ARM manual, and obtain a list of instruction mnemonics along with the source of each instruction’s conditional behavior.
- For X86.
- Obtained instruction byte sequences referring to PokeEMU. PokeEMU generates high-coverage test cases for an emulator and allows those tests to be run on a different emulator or a real machine for comparison.
- Symbolically execute (using FuzzBALL) the instruction decoder of Bochs (a high fidelity emulator) with the first three bytes of the instruction byte sequence set to tbe symbolic.
- Generated 76510 candidate byte sequences which are valid instructions as per the Bochs instruction decoder. After removing instructions prefixes (
lock,csand so on), got 45311 unique byte sequences. - Classify the 45311 byte sequences independently with XED (X86 Encoder Decoder) and FuzzBALL, resulting in 104 and 178 unique instruction mnemonics, respectively.
- Take the union of the instruction mnemonics collected from XED and FuzzBALL, examine the description of each mnemonic in the Intel manual. Obtained 104 instruction mnemonics.

- For ARM.
- Similar logic as X86.
- Obtained 70 instruction mnemonics.
What is the work’s evaluation of the proposed solution?
NONE
What is your analysis of the identified problem, idea and evaluation?
NONE
What are the contributions?
- See Abstract.
- Automated the discovery of instructions to be included for an improved OBC definition on the X86 and ARM architectures.
What are future directions for this research?
- See Section 3.
- The authors state at the end of Section 3 that “Our instruction classification implementation can be extended to become a machine code analysis tool that can be retargeted for multiple platforms. Translating instructions from different architectures to a Universal Assembly Language would be a starting point for such an extension”.
What questions are you left with?
NONE
What is your take-away message from this paper?
NONE
Written on