Discovering instructions for robust binary-level coverage criteria

Vaibhav Sharma, Taejoon Byun, Stephen McCamant, Sanjai Rayadurgam, Mats P. E. Heimdahl

  • Read: 02 May 2025
  • Published: 13 Jul 2017

TECPS 2017: Proceedings of the 1st ACM SIGSOFT International Workshop on Testing Embedded and Cyber-Physical Systems, Pages 1 - 4

https://doi.org/10.1145/3107091.3107092


See also: Toward Rigorous Object-Code Coverage Criteria

Q&A (link)

What are the motivations for this work?

  • See Abstract, Introduction.
  • Traditional OBC (Object-Branch Coverage) definition can be made more resilient to variations in compilers and the structure of generated code by creating more robust definitions. But the process is laborious, error-prone, and architecture-dependent.
  • Want automated discovery of instructions to be included for an improved OBC definition on the X86 and ARM arch.
  • An example below shows that traditional OBC may fail to catch branches due to the miss of conditional jump instructions introduced by compilation optimization (in the instructions generated by O1 cmove is a conditional move). ctermid ctermid2

What is the proposed solution?

  • See Abstract, Section 2.
  • Discover all possible valide instructions by symbolically executing instruction decoders for X86, ARM.
    • For each discovered instruction, translate it to Vine IR, and check if the Vine IR translation satisfies the OBC definition.
  • Enumeration and classification with 3 steps.
    1. Obtain a list of instruction byte sequences for every valid X86 and ARM instruction.
    2. Perform automated classification of the behavior of every instruction byte sequence as exhibiting conditional behavior or not, record the source of conditional behavior.
    3. Verify the classification output against the Intel arch manual, and the ARM manual, and obtain a list of instruction mnemonics along with the source of each instruction’s conditional behavior.
  • For X86.
    • Obtained instruction byte sequences referring to PokeEMU. PokeEMU generates high-coverage test cases for an emulator and allows those tests to be run on a different emulator or a real machine for comparison.
    • Symbolically execute (using FuzzBALL) the instruction decoder of Bochs (a high fidelity emulator) with the first three bytes of the instruction byte sequence set to tbe symbolic.
    • Generated 76510 candidate byte sequences which are valid instructions as per the Bochs instruction decoder. After removing instructions prefixes (lock, cs and so on), got 45311 unique byte sequences.
    • Classify the 45311 byte sequences independently with XED (X86 Encoder Decoder) and FuzzBALL, resulting in 104 and 178 unique instruction mnemonics, respectively.
    • Take the union of the instruction mnemonics collected from XED and FuzzBALL, examine the description of each mnemonic in the Intel manual. Obtained 104 instruction mnemonics. x86
  • For ARM.
    • Similar logic as X86.
    • Obtained 70 instruction mnemonics.

What is the work’s evaluation of the proposed solution?

NONE

What is your analysis of the identified problem, idea and evaluation?

NONE

What are the contributions?

  • See Abstract.
  • Automated the discovery of instructions to be included for an improved OBC definition on the X86 and ARM architectures.

What are future directions for this research?

  • See Section 3.
  • The authors state at the end of Section 3 that “Our instruction classification implementation can be extended to become a machine code analysis tool that can be retargeted for multiple platforms. Translating instructions from different architectures to a Universal Assembly Language would be a starting point for such an extension”.

What questions are you left with?

NONE

What is your take-away message from this paper?

NONE

Written on