-
Notifications
You must be signed in to change notification settings - Fork 38
Fuzzing xlnt project with sydr fuzz for fun and profit
"Syrio says that every hurt is a lesson, and every lesson makes you better"
― George R. R. Martin, A Song of Ice and Fire
This article is dedicated to fuzzing open source software. There are many proven fuzzers (libFuzzer, AFLplusplus, Honggfuzz, etc.) and much more well written tutorials. But I want to tell you about how to apply hybrid fuzzing approaches (combination of fuzzing and symbolic execution) to test open source software. For this purpose I will use our hybrid fuzzing tool sydr-fuzz, that combines the power of Sydr - dynamic symbolic execution tool and libFuzzer - an in-process, coverage-guided, evolutionary fuzzing engine. We will learn not only how to prepare fuzz targets, do hybrid fuzzing, but also how to use our crash triage tool casr, how to collect code coverage reports and apply Sydr to check security predicates for finding interesting bugs using symbolic execution techniques.
It is a hard task to find a single open source project to show all declared above features at once. For this purpose xlnt project suites very well. So, we are ready to start our journey for the glory of bug finding!
"It is no easy thing to slay a dragon, but it can be done."
― George R.R. Martin, A Song of Ice and Fire
The first thing we might want to do is to find an function or a code snippet for fuzzing. For complex projects like (suricata, postgresql, nginx, etc.) it is a hard task: you need a good comprehension of code int!rnals and build system. Fortunately, our project is a library. It has nice API and it is easy to build.
So, what is xlnt? Xlnt is a modern C++ library for manipulating spreadsheets in memory and reading/writing them from/to XLSX files. Let's look at API maybe we could found some functions, that could load .xlsx file?
In xlnt.hpp header very interesting header is included:
#include <xlnt/workbook/workbook.hpp>In this header there is a bunch of interesting functions that work with xlsx files. I put an eye on this function:
/// <summary>
/// Interprets byte vector data as an XLSX file and sets the content of this
/// workbook to match that file.
/// </summary>
void load(const std::vector<std::uint8_t> &data);Yes, that's exactly what we need! Function Interprets byte vector data as an XLSX file. So it parses xls file in other words and input function parameters suites well to libFuzzer. Ok, we have a target function, now we need a plan to do:
- Create a fuzz target for libFuzzer. If you don't familiar with libFuzzer, you could look at this tutorial.
- Create a fuzz target for Sydr and code coverage. For this purpose, you just need to add main function that reads file and pass it contents to LLVMFuzzerTestOneInput.
- Build three executable binaries for libFuzzer, Sydr and coverage.
- Prepare corpus.
Xlnt is already added to oss-sydr-fuzz. So you just clone this repository and build docker container following the instructions.
Next I will try to describe the basic concepts how we prepare project for fuzzing in oss-sydr-fuzz. The fuzzing process is executed in docker container for convenience and reproducible. Let's take a look at Dockerfile for xlnt:
FROM sweetvishnya/ubuntu20.04-sydr-fuzz
MAINTAINER Alexey Vishnyakov
# Clone target from GitHub.
RUN git clone https://github.com/tfussell/xlnt
WORKDIR xlnt
# Checkout specified commit. It could be updated later.
RUN git checkout d88c901faa539f9272a81ba0bab72def70ca18d7 && git submodule update --init --recursive
# Copy build script and targets.
COPY load_fuzzer.cc load_sydr.cc build.sh ./
# Build fuzz targets.
RUN ./build.sh
WORKDIR ..
# Prepare seed corpus.
RUN mkdir /corpus && find /xlnt -name "*.xlsx" | xargs -I {} cp {} /corpusWe use our base image with clang-14 installed. DockerFile contains commands that clone xlnt repository, checkout fixed commit for reproducibility, run build script and prepare seed corpus. Let's take a look a build.sh script. This script builds three executables: libFuzzer, sydr binary and coverage binary. I will not show all script contents here due to its size (66 lines). The main idea is to recompile xlint library with different flags:
# libFuzzer
cmake -D STATIC=ON -D TESTS=OFF \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CXX_FLAGS="-g -fsanitize=fuzzer-no-link,address,bounds,integer,undefined,null,float-divide-by-zero" \
..
# Sydr
cmake -DSTATIC=ON -D TESTS=OFF \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CXX_FLAGS=-g \
..
# Coverage
cmake -DSTATIC=ON -D TESTS=OFF \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CXX_FLAGS="-fprofile-instr-generate -fcoverage-mapping" \
..Now we are ready to look add fuzz target:
#include <xlnt/xlnt.hpp>
#include <libstudxml/parser.hxx>
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
std::vector<uint8_t> v_data(data, data + size);
xlnt::workbook excelWorkbook;
try
{
excelWorkbook.load(v_data);
}
catch (const xlnt::exception& e)
{
return 0;
}
catch (const xml::parsing& e)
{
return 0;
}
return 0;
}Here we do an obvious operations: create vector and load workbook. Very important thing when you fuzz C++ code: it has exceptions. Some exceptions could be treated as bugs, but some not. Definitely unhandled exceptions from standard library could be considered as a bug. Also, exceptions from libraries that are in internals of your fuzz target could be treated as a bug. This exceptions should be handled by target library itself. But exceptions from our fuzz target we should handle in LLVMFuzzerTestOneInput as you could see in xlnt example. You also could look at target for Sydr and coverage.
Before starting fuzzing we should prepare initial corpus. There are often some files inside target repository with need format. We already saw command in Dockerfile that creates initial corpus:
# Prepare seed corpus.
RUN mkdir /corpus && find /xlnt -name "*.xlsx" | xargs -I {} cp {} /corpusAlright, we have prepared docker container for fuzzing. Now let the fuzzing begins!
"The night is dark and full of terrors."
― George R.R. Martin, A Song of Ice and Fire
Before starting sydr-fuzz for fuzzing we have to write simple config in toml format. Here below a configuration file sydr-fuzz.toml for xlnt:
exit-on-time = 7200
[sydr]
args = "-s 90"
target = "/load_sydr @@"
jobs = 2
[libfuzzer]
path = "/load_fuzzer"
args = "-rss_limit_mb=8192 -timeout=10 -jobs=1000 -workers=8 /corpus"
[cov]
target = "/load_cov @@"Let's have a brief look at this config file:
exit-on-time - is an optional parameter takes time in seconds. If during this time (2 hours in our case), the coverage does not increase, fuzzing is automatically terminated.
[sydr] table may contain the following parameters:
args is an args string for sydr. Options for log files and input files are set automatically. It is recommended to use -s option for uniform input processing by Sydr.
target is a command line for target program to run. Instead input file name use @@.
jobs is a number of sydr's to run. Default is 1.
[libfuzzer] table contains arguments for libFuzzer.
[cov] table contains target run string for code coverage binary.
To sum up, we will start fuzzing with 8 libFuzzer workers and 2 sydr jobs. Fuzzing process will stop if 2 hours coverage (cov:) will not increase or libFuzzer finds 1000 crashes/oom/timeouts. Okay, let's run docker:
$ sudo docker run --privileged --network host -v /etc/localtime:/etc/localtime:ro --rm -it -v $PWD:/fuzz oss-sydr-fuzz-xlnt /bin/bash
Change directory to /fuzz:
# cd /fuzz
Run hybrid fuzzing:
# sydr-fuzz run
At last we started sydr-fuzz. Firstly sydr-fuzz merges initial corpus directories to it's project corpus directory. Later I tell about sydr-fuzz project directory, but now let's look at logs. After merge step fuzzing was started. We see pretty-colored libFuzzer logs and it already found a crash! A true segmentation fault, cool!!! Also we could see information about Sydr. reloads{unique} shows how many inputs from sydr a usefull for libFuzzer (libFuzzer reloaded this input). Files from all sydr instances are copied to project corpus directory. One file from sydr could be reloaded by many libFuzzer workers, so we also count unique files among all reloads. Files from all sydr instances are copied to project corpus directory. So one file from sydr could be reloaded by many libFuzzer workers, so we also count unique files among all reloads. We update reloads statistics by timer to see profit from sydr in real time, but information about generated inputs is updated per sydr execution. Let wait till fuzzing ends and look at logs again.