← Back to all repos

ctf-agent

https://github.com/verialabs/ctf-agent

📊 Stats

⭐ Stars: 200

📝 Language: Python

📝 Description: Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.

⭐ Star Growth (12 months)

🔬 Research Notes

Stats

  • ⭐ Stars: 200
  • 🍴 Forks: 25
  • 📝 Language: Python
  • 📅 Created: 2026-03-23
  • 🔄 Updated: 2026-03-25
  • 🏷️ Latest Release: No releases
  • Description

    Autonomous CTF solver that races multiple AI models in parallel. 1st place BSidesSF 2026.

    Topics

    None

    Research Summary

    Key Features

  • Architecture

  • Use Cases

  • Assessment

  • Maturity:
  • Documentation:
  • Community:
  • Recommendation:
  • README Excerpt

    ```

    # CTF Agent

    Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.

    Built by [Veria Labs](https://verialabs.com), founded by members of [.;,;.](https://ctftime.org/team/222911) (smiley), the [#1 US CTF team on CTFTime in 2024 and 2025](https://ctftime.org/stats/2024/US). We build AI agents that find and exploit real security vulnerabilities for large enterprises.

    Results

    | Competition | Challenges Solved | Result |

    |-------------|:-:|--------|

    | BSidesSF 2026 | 52/52 (100%) | 1st place ($1,500) |

    The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.

    How It Works

    A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.

    ```

    +-----------------+

    | CTFd Platform |

    +--------+--------+

    |

    +--------v--------+

    | Poller (5s) |

    +--------+--------+

    |

    +--------v--------+

    | Coordinator LLM |

    | (Claude/Codex) |

    +--------+--------+

    |

    +------------------+------------------+

    | | |

    +--------v--------+ +------v---------+ +------v---------+

    | Swarm: | | Swarm: | | Swarm: |

    | challenge-1 | | challenge-2 | | challenge-N |

    | | | | | |

    | Opus (med) | | Opus (med) | | |

    | Opus (max) | | Opus (max) | | ... |

    | GPT-5.4 | | GPT-5.4 | | |

    | GPT-5.4-mini | | GPT-5.4-mini | | |

    | GPT-5.3-codex | | GPT-5.3-codex | | |

    +--------+--------+ +--------+-------+ +----------------+

    | |

    +--------v--------+ +-------v--------+

    | Docker Sandbox | | Docker Sandbox |

    | (isolated) | | (isolated) |

    | | | |

    | pwntools, r2, | | pwntools, r2, |

    | gdb, python... | | gdb, python... |

    +-----------------+ +----------------+

    ```

    Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.

    Quick Start

    ```bash

    # Install

    uv sync

    # Build sandbox image

    docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .

    # Configure credentials

    cp .env.example .env

    # Edit .env with your API keys and CTFd token

    # Run against a CTFd instance

    uv run ctf-solve \

    --ctfd-url https://ctf.example.com \

    --ctfd-token ctfd_your_token \

    --challenges-dir challenges \

    --max-challenges 10 \

    -v

    ```

    Coordinator Backends

    ```bash

    # Claude SDK coordinator (default)

    uv run ctf-solve --coordinator claude ...

    # Codex coordinator (GPT-5.4 via JSON-RPC)

    uv run ctf-solve --coordinator codex ...

    ```

    Solver Models

    Default model lineup (configurable in backend/models.py):

    | Model | Provider | Notes |

    |-------|----------|-------|

    | Claude Opus 4.6 (medium) | Claude SDK | Balanced speed/quality |

    | Claude Opus 4.6 (max) | Claude SDK | Deep reasoning |

    | GPT-5.4 | Codex | Best overall solver |

    | GPT-5.4-mini | Codex | Fast, good for easy challenges |

    | GPT-5.3-codex | Codex | Reasoning model (xhigh effort) |

    ```

    ---

    *Researched: 2026-03-25*

    Generated: 2026-03-28