KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

Cristian Cadar, Daniel Dunbar, Dawson Engler

Read: 22 Apr 2025
Published: 08 Dec 2008

OSDI’08: Proceedings of the 8th USENIX conference on Operating systems design and implementation, Pages 209 - 224

https://dl.acm.org/doi/10.5555/1855741.1855756

Q&A (link)

What are the motivations for this work?

See Abstract, Introduction.
People want an effective bug-finding tool that automatically generates inputs that crash real code.
Currently, code testing is done by code review, manual and random testing, dynamic tools, and static analysis.
Static analysis provide full path coverage, but it reasons poorly about bugs that depend on accurate value info, pointer, and heap layour, among many others.
It has been an open question whether the symbolic execution has any hope of consistently achieving high coverage on real applications. Two common concerns: path explosion, the environment problem.

What is the proposed solution?

See Introduction, Section 3, Section 4.
A symbolic execution tool named KLEE, based on EXE. A variety of optimizations are done to constraint solving, program states representation, seaerch heuristics, to achieve a better performance as well as higher code coverage.
KLEE uses the constraint solver STP, which was originally developed for EXE.
KLEE also has a better support for environment interactions. A wide range of real-world applications are tested with KLEE.
KLEE functions as a hybrid between an operating system for symbolic processes and an interpreter. Each symbolic process has a register file, stack, heap, program counter, and path condition.
KLEE directly interprets compiled programs as LLVM instructions, and maps them to constraints without approximation (bit-level accuracy).
For the environment modeling, use C code to write models that “understand” the semantics of the desired action (reading file data, network packets, etc.), aiming to conceptually return all values that the read could legally produce. Takes about 2500 lines of code to define simple models of roughly 40 system calls.

What is the work’s evaluation of the proposed solution?

See Section 5.
Applied KLEE on coreutils and busybox.
Code coverage is measured by gcov. When calculating the coverage, the size of an executable is measured by the executable lines of code (ELOC), which is about 3000 - 4000 for most of coreutils tools. With global optimization, ELOC is roughly 1/3 the size of the source code. The average coverage of KLEE on coreutils and busybox is 90.9% and 93.5% respectively, which are both significantly higher than the coverage generated by developer tests.
Besides developer tests, comparison of code coverage between KLEE and random tests is also provided.

What is your analysis of the identified problem, idea and evaluation?

The way they model the environment as simple C codes returning legal read results is pretty straightfoward and effective. Directly making use of LLVM instructions is also a very natural idea.

What are the contributions?

See Introduction, Section 3, Section 4, Section 6.
Symbolic execution tool KLEE.
As mentioned in Section 6, KLEE is the first symbolic execution system that supports environment interaction in a traditional way.
Some bugs found in coreutils, busybox.

What are future directions for this research?

See Section 6.
The authors have a long-term goal to be able to take an arbitrary program and routinely get 90%+ code coverage.
GPT suggests Coyote C++ a related tool that claims to have achieved 90%+ code coverage on arbitrary program, I read the related paper Coyote C++ and I don’t think it provides many insights.

What questions are you left with?

NONE

What is your take-away message from this paper?

NONE

GPT-generated Keynotes

Innovations in KLEE’s design:

Search Heuristics: KLEE introduces search heuristics (e.g., random path selection, coverage-optimized strategies) to guide symbolic execution towards maximizing code coverage, especially in large, complex codebases.
Environment Modeling: A major contribution is KLEE’s ability to operate directly on unmodified C code, including complex systems-level code like the coreutils suite and the uClibc library, demonstrating practical applicability.
Automatic Test Generation: It generates test cases that achieve near-complete path coverage on many real-world programs, often outperforming manually written or fuzzing-based test suites.
Error Detection: KLEE finds bugs and undefined behaviors (e.g., divide-by-zero, memory errors) by systematically exploring execution paths and checking for violations using its symbolic engine.

(This keynote generated by GPT is apparently incomplete, for example, it didn’t mention the fact that KLEE works directly on LLVM IR rather than on C source code).

Written on