Chapter 7: Dynamic Testing and Fuzzing¶

7.1 Testing by Execution: The Foundation of Dynamic Analysis¶
Static analysis (Chapter 5) examines code without running it. Dynamic testing requires the program to execute — inputs flow in, outputs flow out, and behavior is observed in real time. This is the most direct way to establish what software actually does, as opposed to what it was intended to do or what a static model predicts it does. Many vulnerabilities are simply impossible to detect statically: timing-dependent race conditions, runtime configuration errors, server-side logic flaws that only manifest under specific state sequences, and memory corruption bugs that depend on heap layout.
Dynamic Application Security Testing (DAST) applies execution-based testing specifically to discover security vulnerabilities in running applications. Unlike SAST, which runs on source code before deployment, DAST operates against a deployed application — making it architecture-agnostic and language-agnostic. A DAST tool doesn't care whether your backend is Python, Java, or Go; it sends HTTP requests and analyzes responses.
Key Insight: DAST simulates what an attacker does. It probes your application from the outside with no privileged knowledge, exactly as an adversary would approach a target. The tool's advantage over a human attacker is speed and repeatability; the human attacker's advantage over the tool is creativity and context.
7.2 DAST Architecture and Workflow¶
A DAST scanner operates in three sequential phases:
Phase 1: Discovery / Spidering¶
Before attacking, the scanner must map the application's attack surface — all the URLs, forms, API endpoints, parameters, and input vectors that could accept data. This is done through:
- Traditional crawling — Following HTML links, submitting forms, discovering URLs
- AJAX/SPA crawling — Executing JavaScript to discover dynamically rendered content
- API discovery — Parsing OpenAPI/Swagger specs or discovering REST endpoints empirically
- Sitemap parsing — Using
robots.txt, sitemaps, and known-path wordlists
The output is a comprehensive inventory of attack surface, organized as a site tree.
Phase 2: Active Scanning¶
For each discovered input point (URL parameter, form field, HTTP header, JSON body field), the scanner systematically sends malicious test payloads designed to trigger vulnerability-specific responses:
Parameter: ?id=1
Test payloads:
SQL Injection: ?id=1' OR '1'='1
XSS: ?id=<script>alert(1)</script>
Path Traversal: ?id=../../../etc/passwd
Command Injection: ?id=1; ls -la
XXE: ?id=<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
The scanner analyzes responses for signatures of successful exploitation: error messages, unexpected content, timing anomalies, or out-of-band callbacks.
Phase 3: Analysis and Reporting¶
Findings are classified by severity (typically using CVSS) and deduplicated. A quality DAST report includes:
- Evidence (request/response demonstrating the vulnerability)
- Risk rating and CVSS score
- Reproduction steps
- Remediation guidance
- False positive status (some findings require manual validation)
7.3 OWASP ZAP: Architecture and Professional Usage¶
OWASP ZAP (Zed Attack Proxy) is the premier open-source DAST tool, maintained by the Open Web Application Security Project. Its architecture comprises several cooperative components:
| Component | Function |
|---|---|
| Proxy | Intercepts browser ↔ application traffic; records all requests |
| Spider | Passive crawler discovering application structure |
| Ajax Spider | Executes JavaScript to discover SPA content |
| Passive Scanner | Analyzes traffic without sending additional requests |
| Active Scanner | Sends attack payloads to all discovered inputs |
| Fuzzer | Targeted parameter fuzzing with custom wordlists |
| Scripting Engine | Automates complex test scenarios via JS/Python/Groovy |
CI/CD Integration with ZAP¶
ZAP can be fully automated in CI/CD pipelines via its Docker image and command-line API:
# Baseline scan (passive only — safe for any environment)
docker run --rm owasp/zap2docker-stable zap-baseline.py \
-t https://staging.myapp.com \
-r baseline-report.html \
-z "-config scanner.attackStrength=HIGH"
# Full active scan (only against non-production environments)
docker run --rm owasp/zap2docker-stable zap-full-scan.py \
-t https://staging.myapp.com \
-r full-scan-report.html \
--auto
# API scan using OpenAPI spec
docker run --rm owasp/zap2docker-stable zap-api-scan.py \
-t https://staging.myapp.com/api/v1/openapi.json \
-f openapi \
-r api-scan-report.html
Authenticated Scanning¶
Most application vulnerabilities hide behind authentication. Configuring ZAP to maintain a valid session requires:
- Recording a login sequence as a ZAP script
- Specifying the authentication method (form-based, HTTP basic, OAuth bearer token)
- Defining session validity indicators (what a logged-in response looks like)
- Configuring automatic re-authentication when session expires
Without authenticated scanning, DAST only sees the login page and public content — typically 10-20% of the actual attack surface.
7.4 Burp Suite: Professional Security Testing¶
Burp Suite (PortSwigger) is the industry-standard commercial tool for manual and automated web security testing. Its workflow centers on the proxy — all browser traffic is intercepted, allowing real-time inspection and modification.
Key modules:
Burp Scanner — Automated active scanning with a sophisticated detection engine, particularly strong at injection vulnerabilities and logic flaws.
Burp Intruder — Highly configurable fuzzing tool with four attack modes: - Sniper — One payload set, one position at a time - Battering Ram — Same payload in all positions simultaneously - Pitchfork — Multiple payload sets, synchronized across positions - Cluster Bomb — Multiple payload sets, all combinations (Cartesian product)
Burp Repeater — Manual request replay with live editing. Essential for understanding a vulnerability's behavior and crafting targeted exploits.
Burp Collaborator — Out-of-band (OOB) vulnerability detection. When a server makes an external request to the Collaborator server, it confirms vulnerabilities like SSRF, blind XSS, XXE, and blind SQL injection that don't produce visible output in the response.
# Example: Blind SSRF detection via Collaborator
# Original request: POST /webhook with url=https://partner.com/notify
# Injected payload: url=https://xyz.burpcollaborator.net/callback
# If Collaborator receives a DNS lookup or HTTP request → SSRF confirmed
7.5 Manual Dynamic Testing Techniques¶
Automated scanners miss logic flaws. The following manual techniques complement automated DAST:
Parameter Tampering — Modifying URL parameters, POST data, or cookies that encode business logic:
Cookie Manipulation — Decoding, modifying, and re-encoding session cookies to test for insecure deserialization, predictable tokens, or improper access controls.
HTTP Header Injection — Testing non-standard headers for unexpected processing: X-Forwarded-For, X-Original-URL, X-Rewrite-URL, Host header injection.
Forced Browsing — Directly accessing URLs that should be restricted, bypassing the intended navigation flow. Tests for broken access control (OWASP Top 10 #1 2021).
Session Token Analysis — Capturing multiple session tokens and analyzing for patterns: sequential IDs, short entropy, predictable timestamps embedded in tokens.
Business Logic Testing — Exercising workflows in unintended sequences: placing an order with zero quantity, submitting a form twice, navigating backward in a multi-step process.
7.6 Fuzzing: Automated Discovery of Unknown Unknowns¶
Fuzzing (fuzz testing) is the technique of automatically generating large volumes of inputs to a target program, monitoring for crashes, hangs, or unexpected behavior. It is uniquely powerful at discovering vulnerability classes that other techniques miss: memory corruption bugs (buffer overflows, use-after-free, integer overflows), parser bugs, and edge cases in input handling.
The technique was introduced by Barton Miller at the University of Wisconsin–Madison in 1989. His study sent random inputs to 25 Unix utility programs and found that one-third crashed or hung. The results were shocking for the time and remain instructive: even heavily used, mature software breaks on unexpected inputs.
7.6.1 Fuzzing Types¶
Dumb Fuzzing (Mutation-Based) — Start with valid seed inputs and apply random mutations: flip bits, insert/delete bytes, repeat sequences, insert special characters. Fast to implement, but inefficient — many mutations produce invalid inputs that are rejected before reaching interesting code paths.
Generation-Based (Smart) Fuzzing — Uses a grammar, protocol specification, or format model to generate structurally valid but semantically unusual inputs. A grammar-based fuzzer for JSON will always generate valid JSON, ensuring the parser receives well-formed inputs before encountering edge cases.
Coverage-Guided Fuzzing — The state-of-the-art technique. Instruments the target binary to measure which branches are executed for each input. Uses evolutionary algorithms to preferentially mutate inputs that discover new code coverage. When a mutation causes a new branch to execute, the mutated input is added to the corpus and used for further mutation.
Coverage-Guided Fuzzing Loop:
1. Start with seed corpus (small set of valid inputs)
2. Mutate a corpus input → new test case
3. Run target with new test case (instrumented binary)
4. If new coverage found: add to corpus, go to 2
5. If crash: save crash input, continue
6. If no new coverage: discard, go to 2
7.7 AFL++ and libFuzzer: Production Coverage-Guided Fuzzers¶
AFL++ (American Fuzzy Lop++) is the dominant open-source fuzzer for compiled programs. Key features:
- LLVM/GCC-based compile-time instrumentation for coverage feedback
- Multiple mutation strategies (havoc, splicing, deterministic stages)
- Persistent mode for in-process fuzzing (10x-100x faster)
- Parallel fuzzing across multiple CPU cores
- Fork server to avoid repeated process initialization overhead
# Compiling a target with AFL++ instrumentation
AFL_USE_ASAN=1 afl-clang-fast -o target_fuzz target.c
# Running the fuzzer
afl-fuzz -i seeds/ -o findings/ -m none -- ./target_fuzz @@
# ^^ Input file placeholder
libFuzzer is an LLVM-integrated in-process fuzzer. Rather than launching a new process for each input, libFuzzer calls a fuzz target function directly within the process, achieving enormous throughput:
// libFuzzer target function signature
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Parse, process, or feed 'data' to the target code
ParseInput(data, size);
return 0; // Non-zero return values are reserved
}
Google OSS-Fuzz runs libFuzzer and AFL++ continuously on over 1,000 critical open-source projects. It has found over 10,000 vulnerabilities in projects including OpenSSL, Chromium, FFmpeg, curl, and the Linux kernel. Integration requires writing a fuzz target function and a simple build script — the service handles scaling, storage, and bug reporting automatically.
7.8 Sanitizers: Amplifying Fuzzing Effectiveness¶
A fuzzer can only detect bugs that produce observable failures: crashes, hangs, or assertion violations. Many serious vulnerabilities — particularly memory corruption bugs — don't immediately crash the program; they silently corrupt memory, enabling exploitation. Sanitizers are compile-time instrumentation tools that make these bugs loudly observable.
| Sanitizer | Abbreviation | Detects | Overhead |
|---|---|---|---|
| AddressSanitizer | ASan | Buffer overflows, heap use-after-free, stack smashing | 2x memory, 2x time |
| UndefinedBehaviorSanitizer | UBSan | Signed integer overflow, null pointer deref, type confusion | ~20% time |
| MemorySanitizer | MSan | Use of uninitialized memory | 3x time |
| ThreadSanitizer | TSan | Data races, deadlocks | 5-20x time |
Always compile fuzz targets with ASan + UBSan at minimum:
Sanitizers convert silent memory bugs into immediate, deterministic crashes with detailed diagnostic output — stack traces, memory maps, and variable states — dramatically accelerating triage.
7.9 Crash Triage and Differential Fuzzing¶
When a fuzzer finds a crash, the raw crash input may not be the minimal reproducible case. Crash triage involves:
- Deduplication — Many crash inputs trigger the same underlying bug; deduplicate by crash stack hash
- Minimization — Reduce the crashing input to the smallest possible case (
afl-tmin,creduce) - Severity classification — Is the crash exploitable? (arbitrary write > controlled read > assertion)
- Reproduction — Ensure the minimal case reproduces reliably and is saved with the bug report
Differential fuzzing tests multiple implementations of the same specification against identical inputs, flagging discrepancies. Two XML parsers should produce identical parse trees for the same input; if they don't, one has a bug. This technique discovered the JSONFuzz findings where multiple JSON parsers had inconsistent behavior for edge cases, enabling security bypasses in systems that applied validation with one parser but processed with another.
7.10 IAST: Interactive Application Security Testing¶
IAST instruments the application runtime — typically via a language agent injected at startup — to monitor data flow and taint propagation during normal test execution (functional testing, integration tests, load tests). Unlike DAST, IAST sees inside the application and can pinpoint the exact vulnerable line of code. Unlike SAST, it operates on real execution paths with real data.
IAST agent monitors:
→ Untrusted input enters the application (HTTP parameter, header, cookie)
→ Input flows through transformation functions
→ Input reaches a sensitive sink (database query, file path, shell command)
→ If input reaches sink without proper sanitization → vulnerability reported
IAST tools (Contrast Security, Seeker, HCL AppScan) produce extremely low false-positive rates because they report only vulnerabilities that were actually exercised, with full execution context. The downside is the requirement to instrument and deploy the application — impractical for third-party systems or compiled binaries.
7.11 Regression Testing for Security Patches¶
When a security vulnerability is fixed, a security regression test should be added to the test suite to prevent re-introduction:
# After fixing CVE-2023-XXXX (SQL injection in search endpoint):
def test_search_rejects_sql_injection():
"""Regression test: CVE-2023-XXXX — SQL injection in search"""
malicious_inputs = [
"'; DROP TABLE users; --",
"1' OR '1'='1",
"admin'/*",
"1; SELECT * FROM secrets",
]
for payload in malicious_inputs:
response = client.get(f"/search?q={payload}")
assert response.status_code in [400, 422]
assert "syntax error" not in response.text.lower()
assert "sql" not in response.text.lower()
Regression tests are the institutional memory of your security history. They prevent the tragically common pattern of re-discovering and re-fixing the same vulnerability in a future development cycle.
Key Terms¶
| Term | Definition |
|---|---|
| DAST | Dynamic Application Security Testing — black-box testing of running applications |
| OWASP ZAP | Open-source DAST proxy and scanner maintained by OWASP |
| Burp Suite | Commercial web application security testing platform (PortSwigger) |
| Fuzzing | Automated testing with semi-random or mutated inputs to discover crashes |
| Coverage-Guided Fuzzing | Uses execution coverage feedback to guide mutation toward new code paths |
| AFL++ | American Fuzzy Lop++ — dominant open-source coverage-guided fuzzer |
| libFuzzer | LLVM in-process fuzzer for high-throughput target function fuzzing |
| Seed Corpus | Initial set of valid inputs used as starting point for fuzzing mutations |
| AddressSanitizer (ASan) | Compile-time tool that detects memory safety violations at runtime |
| Mutation Score | Ratio of detected-to-introduced code mutations (test suite strength metric) |
| Out-of-band Detection | Vulnerability detection via external callbacks (Burp Collaborator) |
| Crash Triage | Process of deduplicating, minimizing, and classifying fuzzer-found crashes |
| Differential Fuzzing | Comparing multiple implementations against the same inputs to find discrepancies |
| IAST | Interactive Application Security Testing — runtime instrumented hybrid approach |
| Forced Browsing | Directly accessing URLs that should be restricted without proper authorization |
| Session Token Analysis | Examining tokens for predictability, short entropy, or embedded timestamps |
| OSS-Fuzz | Google's continuous fuzzing service for open-source projects |
| Sanitizers (ASan/UBSan/MSan) | Runtime instrumentation tools that amplify bug observability during fuzzing |
| Parameter Tampering | Modifying request parameters to test business logic controls |
| Regression Test | Test added after a bug fix to prevent re-introduction of the same defect |
Review Questions¶
-
Explain why DAST is described as "architecture-agnostic" compared to SAST. What types of vulnerabilities can DAST find that SAST cannot, and vice versa?
-
Describe the three phases of a DAST scan (spider/crawl → active scan → report). What specific risk does an unauthenticated DAST scan miss, and how does authenticated scanning address this gap?
-
Compare Burp Intruder's four attack modes (Sniper, Battering Ram, Pitchfork, Cluster Bomb). For each, describe a specific testing scenario where that mode is most appropriate.
-
Barton Miller's 1989 fuzzing study found that 1 in 3 Unix utilities crashed on random input. Why is this result still relevant 35 years later, and what does it imply about modern software?
-
Explain the evolutionary feedback loop in coverage-guided fuzzing. How does AFL++ "learn" to explore deeper code paths over time, and why is this fundamentally more effective than dumb (random) fuzzing?
-
You are fuzzing a PDF parser compiled as a Linux binary. Describe the complete setup: compilation flags (including which sanitizers), seed corpus construction, fuzzer command, and what success looks like after 24 hours.
-
What is a "silent memory corruption" bug, and why are sanitizers essential when combined with fuzzing? Give an example of a vulnerability class that would be invisible to a fuzzer without AddressSanitizer.
-
Compare DAST and IAST across four dimensions: deployment requirements, false positive rate, vulnerability depth reported, and applicability to third-party software.
-
Explain the concept of differential fuzzing and describe how it could be used to find security-relevant discrepancies between two JSON parsing libraries used in a security policy enforcement system.
-
After patching a SQL injection vulnerability in a user search feature, what security regression tests should you add to the test suite? Write at least three test cases in pseudocode.
Further Reading¶
-
Miller, B.P., Fredriksen, L., & So, B. (1990). "An Empirical Study of the Reliability of UNIX Utilities." Communications of the ACM, 33(12), 32-44. — The original fuzzing paper that started the field.
-
Zalewski, M. (2015). The Fuzzer's Book / AFL Technical Details. lcamtuf.blogspot.com. — Deep technical explanation of AFL's coverage-guided mutation strategy from its creator.
-
Serebryany, K., et al. (2012). "AddressSanitizer: A Fast Address Sanity Checker." USENIX Annual Technical Conference. — The original ASan paper describing the instrumentation approach and performance characteristics.
-
OWASP. (2023). OWASP Testing Guide v4.2. owasp.org. — Comprehensive methodology for manual and automated dynamic testing of web applications.
-
Klees, G., et al. (2018). "Evaluating Fuzz Testing." ACM CCS 2018. — Rigorous analysis of fuzzing evaluation methodology; essential reading for understanding what fuzzing benchmarks actually measure.