Artificial Intelligence-Assisted Fuzzing: New Horizons in Software Security Testing

tomasz turba

15 December, 2023

Cybersecurity is evolving with every new technology, so penetration testers feel the need for more advanced methods and tools. One such emerging trend is Artificial Intelligence (AI)-assisted fuzzing, an approach that merges traditional fuzzing approaches with the innovative capabilities of AI. This article delves into the concept of fuzzing, its enhancement through AI, and the potential future directions of this synergy.

What is fuzzing?

Fuzzing is a well-established method in software testing, primarily focused on discovering security vulnerabilities, often those which are hard to discover during manual testing. It involves providing invalid, unexpected, or random data inputs to a computer program. The goal is to uncover code errors and security vulnerabilities that could lead to breaches, crashes, or other unintended behaviors. Fuzzing, at its core, involves bombarding a system with invalid, unexpected, or random data inputs.

Is AI-assisted fuzzing possible?

AI’s role in fuzzing should be transformative, offering a more sophisticated approach in comparison to conventional methods like those from AFL (described in the next paragraph). Through machine learning and neural networks, AI can process and analyze extensive data sets, learning from previous tests to predict and implement more effective fuzzing strategies. AI’s predictive capabilities enable it to identify patterns and anomalies that human testers might overlook, thereby enhancing the depth and scope of security testing.

AI-enhanced fuzzing capabilities

Let’s take a look at possible capabilities of modern fuzzers:

Incorporating Machine Learning: modern fuzzing integrates machine learning to analyze large data sets, learn from previous tests, and adapt strategies in real-time.
Input Generation: AI can help in generating more complex and targeted inputs, especially effective in testing software with intricate input formats (like multimedia files or complex protocols).
Automated Analysis: AI algorithms can automatically analyze the results of fuzzing to identify patterns or anomalies that might indicate deeper, hidden vulnerabilities.
Predictive Capabilities: AI-based fuzzers can predict which areas of the code are more likely to contain vulnerabilities, allowing for more focused testing.

American Fuzzy Loop (greets to lcamtuf)

AFL is a security-oriented fuzzer that relies on a form of genetic algorithms to automatically discover new inputs. It starts with a set of seed files and then continuously mutates them, monitoring the program for new and interesting behaviors (like crashes or memory leaks). AFL is known for its coverage-guided approach, where it monitors which parts of the code are being executed and focuses on inputs that explore new paths. As for advantages - it’s highly effective at finding memory corruption bugs and is known for its simplicity and ease of use. But there are also limitations: traditional methods like AFL can be less effective with complex input formats or in situations where intelligent input generation is required.

AFL screenshot Fig. 1. AFL screenshot, src: lcamtuf.coredump.cx

Genetic algorithms are used in the AFL open-source tool set, which is at the heart of a new cloud-based product, Fuzzbuzz but AFL is also part of Google’s ClusterFuzz project.

Clusterfuzz project scheme Fig. 2. Clusterfuzz project scheme, src: lcamtuf.coredump.cx

ClusterFuzz is a scalable fuzzing infrastructure developed by Google to find security and stability issues in software. It is used by Google to fuzz all its products and serves as the fuzzing backend for OSS-Fuzz. However, the documentation for ClusterFuzz does not indicate the direct use of artificial intelligence in its process. Instead, ClusterFuzz emphasizes features like scalability, accurate crash deduplication, automatic bug filing, triage and closing, and support for multiple coverage-guided fuzzing engines (like libFuzzer, AFL++, and Honggfuzz). ClusterFuzz provides an integrated and automated environment for fuzzing, including features like test case minimization, regression finding, performance analysis, and a user-friendly web interface for managing and viewing crashes. These functionalities are highly advanced in terms of automation and efficiency but do not explicitly mention the application of AI technologies such as machine learning or neural networks.

As can be read on Google Security Blog post titled “AI-Powered Fuzzing: Breaking the Bug Hunting Barrier” it discusses the integration of Google’s Large Language Models (LLMs) with mentioned earlier OSS-Fuzz, their automated vulnerability discovery service for open source projects. This integration has enabled OSS-Fuzz to enhance its performance significantly. By employing LLMs, Google has been able to increase code coverage for critical projects without manual coding. This approach represents a promising new direction in scaling security improvements across numerous projects and making fuzzing more accessible and effective for future projects.

Experimental scheme of automated fuzzing process using LLMs integration Fig. 3. Experimental scheme of automated fuzzing process using LLMs integration, source: https://google.github.io/oss-fuzz/research/llms/target_generation/

The application of LLMs to OSS-Fuzz involves an evaluation framework that connects to the LLM, generates fuzz targets using prompts from the LLM, and assesses the results based on changes in code coverage. This method has shown substantial improvements in code coverage for various projects, like tinyxml2, and even rediscovered known vulnerabilities in areas previously not covered by fuzzing, like in the OpenSSL project.

Fuzz target snippets for tinyxml2 Fig. 4. Fuzz target snippets for tinyxml2, source: https://google.github.io/oss-fuzz/research/llms/target_generation/

Manually achieving the same results as seen in the blog post with tinyxml2 would have taken at least a full day’s work, translating to several years to manually extend this coverage across all OSS-Fuzz projects. Considering the encouraging outcomes with tinyxml2, Google Security Teams goal is to deploy these techniques in a production environment and provide similar automated coverage for other OSS-Fuzz projects.

Furthermore, in the OpenSSL project, their Large Language Model succeeded in automatically creating a functional target that re-identified CVE-2022-3602. This particular vulnerability was located in a code segment previously not covered by non-AI fuzzing. Although this vulnerability isn’t new, its discovery indicates that increasing code coverage through these methods will likely uncover more vulnerabilities that current fuzzing techniques overlooked.

Rediscovery of the OpenSSL CVE Fig. 5. Rediscovery of the OpenSSL CVE, source: https://google.github.io/oss-fuzz/research/llms/target_generation/

As can be read, the fuzz targets generated by LLM often contain various trivial defects, which can be fixed by a separate LLM query. The prompt of the code fixing query can be structured as follows, where the raw code and error are respectively replaced with the fuzz target source code generated by the LLM and the build error messages extracted from pages of build logs:

Given the following code and its build error message, fix the code without affecting its functionality.

First explain the reason, then output the whole fixed code. If a function is missing, fix it by including the related libraries. Code: * *<CODE_HERE>* * Build error message: * *<ERROR_MESSAGE_HERE>* * Fixed code: * *<FIXED_CODE_HERE>* *

The example prompts and outputs generated for the testing can be downloaded here from the experimental report about LLM target generation.

This innovative approach underlines the potential of AI in automating and enhancing the fuzzing process, making a significant impact on the field of cybersecurity and vulnerability detection. In the coming months, the Google team plans to open source their evaluation framework, enabling researchers to test their methods for automatic fuzz target generation. The optimization of Large Language Models (LLMs) for fuzzing target generation will continue, focusing on model fine-tuning, prompt engineering, and infrastructure improvements.

Long-term objectives of AI-enhanced fuzzing will probably include:

Integrating LLM fuzz target generation as a core feature in different fuzz testing engines (e.g. OSS-Fuzz), aiming for continuous generation of new targets for projects without manual intervention.
Expanding support beyond typical programming languages (e.g. C/C++) projects to include additional languages like Python and Java.
Dedication to realizing a future where vulnerability detection is personalized and requires minimal manual effort from developers. Through LLM-generated fuzz targets, fuzz testing engines aspire to bolster open source security globally.

In summary, while traditional fuzzing methods like AFL have been effective in uncovering a wide range of vulnerabilities, the integration of AI in modern fuzzing techniques offers more sophisticated, targeted, and efficient testing, especially useful in dealing with complex software systems and advanced threat landscapes.

What is fuzzing?

Is AI-assisted fuzzing possible?

AI-enhanced fuzzing capabilities

American Fuzzy Loop (greets to lcamtuf)

Other Insights

Hacking the invisible: A deep dive into Sub-GHz communication and flaws in the devices we use every day

Beyond fingerprints: Discussing the challenges of behavioral biometrics security

Mobile Device Security in today's enterprise landscape: A comprehensive approach

Happy to get a call or email
and help!