ctf-agent

https://github.com/verialabs/ctf-agent

📊 Stats

⭐ Stars: 200

📝 Language: Python

📝 Description: Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.

⭐ Star Growth (12 months)

🔬 Research Notes

Stats

⭐ Stars: 200

🍴 Forks: 25

📝 Language: Python

📅 Created: 2026-03-23

🔄 Updated: 2026-03-25

🏷️ Latest Release: No releases

Description

Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.

Topics

None

Research Summary

Key Features

Architecture

Use Cases

Assessment

Maturity:

Documentation:

Community:

Recommendation:

README Excerpt

```

# CTF Agent

Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.

Built by [Veria Labs](https://verialabs.com), founded by members of [.;,;.](https://ctftime.org/team/222911) (smiley), the [#1 US CTF team on CTFTime in 2024 and 2025](https://ctftime.org/stats/2024/US). We build AI agents that find and exploit real security vulnerabilities for large enterprises.

Results

| Competition | Challenges Solved | Result |

|-------------|:-:|--------|

| BSidesSF 2026 | 52/52 (100%) | 1st place ($1,500) |

The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.

How It Works

A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.

```

+-----------------+

| CTFd Platform |

+--------+--------+

+--------v--------+

| Poller (5s) |

+--------+--------+

+--------v--------+

| Coordinator LLM |

| (Claude/Codex) |

+--------+--------+

+------------------+------------------+

| | |

+--------v--------+ +------v---------+ +------v---------+

| | | | | |

| GPT-5.4 | | GPT-5.4 | | |

+--------+--------+ +--------+-------+ +----------------+

| |

+--------v--------+ +-------v--------+

| Docker Sandbox | | Docker Sandbox |

| (isolated) | | (isolated) |

| | | |

| pwntools, r2, | | pwntools, r2, |

| gdb, python... | | gdb, python... |

+-----------------+ +----------------+

```

Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.

Quick Start

```bash

# Install

uv sync

# Build sandbox image

docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .

# Configure credentials

cp .env.example .env

# Edit .env with your API keys and CTFd token

# Run against a CTFd instance

uv run ctf-solve \

--ctfd-url https://ctf.example.com \

--ctfd-token ctfd_your_token \

--challenges-dir challenges \

--max-challenges 10 \

-v

```

Coordinator Backends

```bash

# Claude SDK coordinator (default)

uv run ctf-solve --coordinator claude ...

# Codex coordinator (GPT-5.4 via JSON-RPC)

uv run ctf-solve --coordinator codex ...

```

Solver Models

Default model lineup (configurable in backend/models.py):

| Model | Provider | Notes |

|-------|----------|-------|

| Claude Opus 4.6 (medium) | Claude SDK | Balanced speed/quality |

| Claude Opus 4.6 (max) | Claude SDK | Deep reasoning |

| GPT-5.4 | Codex | Best overall solver |

| GPT-5.4-mini | Codex | Fast, good for easy challenges |

| GPT-5.3-codex | Codex | Reasoning model (xhigh effort) |

```

---

*Researched: 2026-03-25*