ctf-agent
https://github.com/verialabs/ctf-agent
📊 Stats
⭐ Stars: 200
📝 Language: Python
📝 Description: Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.
⭐ Star Growth (12 months)
🔬 Research Notes
Stats
Description
Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.
Topics
None
Research Summary
Key Features
Architecture
Use Cases
Assessment
README Excerpt
```
# CTF Agent
Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.
Built by [Veria Labs](https://verialabs.com), founded by members of [.;,;.](https://ctftime.org/team/222911) (smiley), the [#1 US CTF team on CTFTime in 2024 and 2025](https://ctftime.org/stats/2024/US). We build AI agents that find and exploit real security vulnerabilities for large enterprises.
Results
| Competition | Challenges Solved | Result |
|-------------|:-:|--------|
| BSidesSF 2026 | 52/52 (100%) | 1st place ($1,500) |
The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.
How It Works
A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.
```
+-----------------+
| CTFd Platform |
+--------+--------+
|
+--------v--------+
| Poller (5s) |
+--------+--------+
|
+--------v--------+
| Coordinator LLM |
| (Claude/Codex) |
+--------+--------+
|
+------------------+------------------+
| | |
+--------v--------+ +------v---------+ +------v---------+
| Swarm: | | Swarm: | | Swarm: |
| challenge-1 | | challenge-2 | | challenge-N |
| | | | | |
| Opus (med) | | Opus (med) | | |
| Opus (max) | | Opus (max) | | ... |
| GPT-5.4 | | GPT-5.4 | | |
| GPT-5.4-mini | | GPT-5.4-mini | | |
| GPT-5.3-codex | | GPT-5.3-codex | | |
+--------+--------+ +--------+-------+ +----------------+
| |
+--------v--------+ +-------v--------+
| Docker Sandbox | | Docker Sandbox |
| (isolated) | | (isolated) |
| | | |
| pwntools, r2, | | pwntools, r2, |
| gdb, python... | | gdb, python... |
+-----------------+ +----------------+
```
Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.
Quick Start
```bash
# Install
uv sync
# Build sandbox image
docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .
# Configure credentials
cp .env.example .env
# Edit .env with your API keys and CTFd token
# Run against a CTFd instance
uv run ctf-solve \
--ctfd-url https://ctf.example.com \
--ctfd-token ctfd_your_token \
--challenges-dir challenges \
--max-challenges 10 \
-v
```
Coordinator Backends
```bash
# Claude SDK coordinator (default)
uv run ctf-solve --coordinator claude ...
# Codex coordinator (GPT-5.4 via JSON-RPC)
uv run ctf-solve --coordinator codex ...
```
Solver Models
Default model lineup (configurable in backend/models.py):
| Model | Provider | Notes |
|-------|----------|-------|
| Claude Opus 4.6 (medium) | Claude SDK | Balanced speed/quality |
| Claude Opus 4.6 (max) | Claude SDK | Deep reasoning |
| GPT-5.4 | Codex | Best overall solver |
| GPT-5.4-mini | Codex | Fast, good for easy challenges |
| GPT-5.3-codex | Codex | Reasoning model (xhigh effort) |
```
---
*Researched: 2026-03-25*