Comparison of Blackbox and Whitebox Fuzzers in Finding

Download Report

Transcript Comparison of Blackbox and Whitebox Fuzzers in Finding

Comparison of Blackbox and Whitebox
Fuzzers in Finding Software Bugs
Marjan Aslani, Nga Chung, Jason Doherty,
Nichole Stockman, and William Quach
Summer Undergraduate Program in Engineering Research at Berkeley
(SUPERB) 2008
Team for Research in Ubiquitous Secure Technology
TRUST Autumn 2008 Conference: November 11-12, 2008
Overview



Introduction to Fuzz testing
Our research
Result
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
2
What Is Fuzzing?

A method of finding software holes by feeding
purposely invalid data as input to a program.
– B. Miller et al.; inspired by line noise
–
–
–
Apps: image processors, media players, OS
Fuzz testing is generally automated
Finds many problems related to reliability; many of which are
potential security holes.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
3
Types of Fuzz Testing

BlackBox : Randomly generated data is fed to a program as
input to see if it crashes.
–
–

Does not require knowledge of the program source code/ deep
code inspection.
A quick way of finding defects without knowing details of the
application.
WhiteBox: Creates test cases considering the target
program's logical constraints and data structure.
–
–
Requires knowledge of the system and how it uses the data.
Deeper penetration into the program.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
4
Zzuf - Blackbox Fuzzer

Finds bugs in applications by corrupting
random bits in user-contributed data.

To make new test cases, Zzuf uses a range
of seeds and fuzzing ratios (corruption ratio).
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
5
Catchconv - WhiteBox Fuzzer

To create test cases, CC starts with a valid input,
observes the program execution on this input, collects
the path condition followed by the program on that
sample, and attempts to infer related path conditions
that lead to an error, then uses this as the starting
point for bug-finding.

CC has has some downtime when it only traces a
program and is not generating new fuzzed files.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
6
Valgrind



A tool for detecting memory management
errors.
Reports the line number in the code where the
program error occurred.
Helped us find and report more errors than we
would if we focused solely on segmentation
faults.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
7
Types of errors reported by Valgrind
By tracking a program’s execution of a file, Valgrind determines the
types of errors that occur which may include:







Invalid writes
Double free - Result 256
Invalid reads
Double free
Uninitialized values
Syscal Pram
Memory leak
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
8
Program run under Valgrind
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
9
Methodology
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
10
Metafuzz

All of the test files that triggered bugs were
uploaded on Metafuzz.com.
–
The webpage contained:




Link to the test file
Bug type
Program that the bug was found in
Stack hash number where the bug was located
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
11
Metafuzz webpage
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
12
Target applications


Mplayer, Antiword, ImageMagick Convert and Adobe
Flash Player
MPlayer the promary target:
–
–
–
–
–


OS software
Preinstalled on many Linux distributions
Updates available via subversion
Convenient to file a bug report
Developer would get back to us!
Adobe bug reporting protocol requires a certain bug to
receive a number of votes form users before it will be
looked at by Flash developers.
VLC requires building subversions from nightly shots.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
13
Research Highlights


In 6 weeks, generated more than 1.2 million
test cases.
We used UC Berkeley PSI-cluster of
computers, which consists of 81 machines
(270 processors).
–


Zzuf, MPlayer, and CC were installed on them.
Created a de-duplication script to find the
unique bugs.
Reported 89 unique bugs; developers have
already eliminated 15 of them.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
14
Result

To provide assessments for the two fuzzers,
we gathered several metrics:
–
–
–
Number of test cases generated
Number of unique test cases generated
Total bugs and total unique bugs found by each
fuzzer.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
15
Result con’t

Generated 1.2 million test cases
–
–

From the test cases:
–
–

962,402 by Zzuf.
279,953 by Catchconv.
Zzuf found 1,066,000 errors.
Catchconv reported 304,936.
Unique (nonduplicate) errors found:
–
–
456 by Zzuf
157 by Cachconv
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
16
Result con’t



Zzuf reports a disproportionately larger
amount of errors than CC. Is Zzuf better than
CC?
No! The two fuzzers generated different
numbers of test cases.
How could we have a fair comparison of the
fuzzers’ efficiency?
–
–
Need to gauge the amount of duplicate work
performed by each fuzzer.
Find how many of these test cases were unique.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
17
Average Unique Errors per 100 Unique Test
Cases

First, we compared
performance of the
fuzzers by the average
number of unique bugs
found per 100 test
cases.
–
–

Zzuf: 2.69
CC : 2.63
Zzuf’s apparent
superiority diminishes.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
18
Unique Errors as % of Total Errors

Next, we analyzed fuzzers’
performance based on the
percentage of unique
errors found out of the
total errors.
–
–

Zzuf: .05%
CC: .22%
Less than a quarter
percent difference
between the fuzzers.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
19
Types of Errors (as % of Total Errors)

Also considered
analyzing the fuzzer
based on bug types
found by the fuzzers.

Zzuf performed better in
finding “invalid write”,
which is a more
important security bug
type.

Not an accurate
comparison, since we
couldn’t tell which bug
specifically caused a
crash.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
20
Conclusion

We were not able to make a solid conclusion
about the superiority of either fuzzer based on
the metric we gathered.

Knowing which fuzzer is able to find serious
errors more quickly would allow us to make a
more informed conclusion about their
comparative efficiencies.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
21
Conclusion con’t


Need to record the amount of CPU clock
cycles required to execute test cases and find
errors.
Unfortunately we did not record this data
during our research, we are unable to make
such a comparison between the fuzzers.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
22
Guides for Future Research
To perform a precise comparison of Zzuf and CC:
1.
The difference between the number of test cases
generated by Zzuf and CC for a given seed file and
specific time frame should be recorded.
2.
Measure CPU time to compare the number of
unique test cases generated by each fuzzer for a
given time.
3.
Need a new method to identify unique errors avoid
reporting duplicate bugs:

Need to use automatically generate a unique hash for each
reported error that can then be used to identify duplicate
errors.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
23
Guides for Future Research con’t
4. Use a more robust data collection
infrastructure that could accommodate the
massive amount of data colected.
–
–
Our ISP shut Metafuzz down due to excess
server load.
Berkeley storage full.
5. Include an internal issue tracker that keeps
track of whether or not a bug has been
reported, to avoid reporting duplicate bugs.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
24
WhiteBox or BlackBox??



With lower budget/ less time: use Blackbox
Once low-hanging bugs are gone, fuzzing
must become smarter: use whitebox
In practice, use both.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
25
Acknowledgment




National Science Foundation (NSF) for funding this
project through the SUPERB-TRUST (Summer
Undergraduate Program in Engineering Research at
Berkeley - Team for Research in Ubiquitous Secure
Technology) program
Kristen Gates (Executive Director for Education for
the TRUST Program)
Faculty advisor David Wagner
Graduate mentors Li-Wen Hsu, David Molner,
Edwardo Segura, Alex Fabrikant, and Alvaro
Cardenas.
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
26
Questions?
Thank you
"Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs", Marjan Aslani
TRUST Autumn 2008 Conference: November 11-12, 2008
27