Inside the 29.5 Million DARPA AI Cyber Challenge: How Autonomous Agents Find & Patch Vulns
What does it take to build a fully autonomous AI system that can find, verify, and patch vulnerabilities in open-source software? Michael Brown, Principal Security Engineer at Trail of Bits, joins us to go behind the scenes of the 3-year DARPA AI Cyber Challenge (AICC), where his team's agent, "Buttercup," won second place.Michael, a self-proclaimed "AI skeptic," shares his surprise at how capable LLMs were at generating high-quality patches . However, he also shared the most critical lesson from the competition: "AI was actually the commodity" The real differentiator wasn't the AI model itself, but the "best of both worlds" approach, robust engineering, intelligent scaffolding, and using "AI where it's useful and conventional stuff where it's useful" .This is a great listen for any engineering or security team building AI solutions. We cover the multi-agent architecture of Buttercup, the real-world costs and the open-source future of this technology .Questions asked:(00:00) Introduction: The DARPA AI Hacking Challenge(03:00) Who is Michael Brown? (Trail of Bits AI/ML Research)(04:00) What is the DARPA AI Cyber Challenge (AICC)?(04:45) Why did the AICC take 3 years to run?(07:00) The AICC Finals: Trail of Bits takes 2nd place(07:45) The AICC Goal: Autonomously find AND patch open source(10:45) Competition Rules: No "virtual patching"(11:40) AICC Scoring: Finding vs. Patching(14:00) The competition was fully autonomous(14:40) The 3-month sprint to build Buttercup v1(15:45) The origin of the name "Buttercup" (The Princess Bride)(17:40) The original (and scrapped) concept for Buttercup(20:15) The critical difference: Finding vs. Verifying a vulnerability(26:30) LLMs were allowed, but were they the key?(28:10) Choosing LLMs: Using OpenAI for patching, Anthropic for fuzzing(30:30) What was the biggest surprise? (An AI skeptic is blown away)(32:45) Why the latest models weren't always better(35:30) The #1 lesson: The importance of high-quality engineering(39:10) Scaffolding vs. AI: What really won the competition?(40:30) Key Insight: AI was the commodity, engineering was the differentiator(41:40) The "Best of Both Worlds" approach (AI + conventional tools)(43:20) Pro Tip: Don't ask AI to "boil the ocean"(45:00) Buttercup's multi-agent architecture (Engineer, Security, QA)(47:30) Can you use Buttercup for your enterprise? (The $100k+ cost)(48:50) Buttercup is open source and runs on a laptop(51:30) The future of Buttercup: Connecting to OSS-Fuzz(52:45) How Buttercup compares to commercial tools (RunSybil, XBOW)(53:50) How the 1st place team (Team Atlanta) won(56:20) Where to find Michael Brown & ButtercupResources discussed during the interview:Trail of BitsButtercup (Open Source Project)DARPA AI Cyber Challenge (AICC)Movie: The Princess Bride
--------
58:20
--------
58:20
Anthropic's AI Threat Report: Real Attacks, Simulated Competence & The Future of Defense
Anthropic's August 2025 AI Threat Intelligence report is out, and it paints a fascinating picture of how attackers are really using large language models like Claude Code. In this episode, Ashish Rajan and Caleb Sima dive deep into the 10 case studies, revealing a landscape where AI isn't necessarily creating brand new attack vectors, but is dramatically lowering the bar and professionalizing existing ones.The discussion covers shocking examples, from "biohacking" attacks using AI for sophisticated extortion strategies , to North Korean IT workers completely dependent on AI, simulating technical competence to successfully gain and maintain employment at Fortune 500 companies . We also explore how AI enables the rapid development of ransomware-as-a-service and malware with advanced evasion, even by actors lacking deep technical skills .This episode is essential for anyone wanting to understand the practical realities of AI threats today, the gaps in defense, and why the volume might still be low but the potential impact is significant.Questions asked:(00:00) Introduction: Anthropic's AI Threat Report(02:20) Case Study 1: Biohacking & AI-Powered Extortion Strategy(08:15) Case Study 2: North Korean IT Workers Simulating Competence with AI(12:45) The Identity Verification Problem & Potential Solutions(16:20) Case Study 3: AI-Developed Ransomware-as-a-Service (RaaS)(17:35) How AI Lowers the Bar for Malware Creation(20:25) The Gray Area: AI Safety vs. Legitimate Security Research(25:10) Why Defense & Enterprise Adoption of AI Security is Lagging(30:20) Case Studies 4-10 Overview (Fraud, Scams, Malware Distribution, Credential Harvesting)(35:50) Multi-Lingual Attacks: Language No Longer a Barrier(36:45) Case Study: Russian Actor's Rapid Malware Deployment via AI(43:10) Key Takeaways: Early Days, But Professionalizing Existing Threats(45:20) Takeaway 2: The Need for Enterprises to Leverage AI Defensively(50:45) The Gap: Security for AI vs. AI for SecurityResources discussed during the interview:Anthropic - Threat Intelligence Report August 2025
--------
52:24
--------
52:24
How Microsoft Uses AI for Threat Intelligence & Malware Analysis
What if the prompts used in your AI systems were treated as a new class of threat indicator? In this episode, Thomas Roccia, Senior Security Researcher at Microsoft, introduces the concept of the IOPC (Indicator of Prompt Compromise), sharing that "when there is a threat actors using a GenAI model for malicious activities, then the prompt... is considered as an IOPC".The conversation dives deep into the practical application of AI in threat intelligence. Thomas shares details from his open-source projects, including NOVA, a tool for detecting adversarial prompts, and an AI agent he built to track the complex money laundering scheme from a $1.4 billion crypto hack . We also explore how AI is dramatically lowering the barrier to entry for complex tasks like reverse engineering, turning a once-niche skill into something accessible to a broader range of security professionals .Questions asked:(00:00) Introduction(02:20) Who is Thomas Roccia?(03:20) Using AI for Reverse Engineering & Malware Analysis(04:30) Building an AI Agent to Track Crypto Money Laundering(11:30) What is an IOPC (Indicator of Prompt Compromise)?(14:40) MITRE ATLAS: A TTP Framework for LLMs(18:20) NOVA: An Open-Source Tool for Detecting Malicious Prompts(23:15) Using RAG for Threat Intelligence on Data Leaks(31:00) Proximity: A New Scanner for Malicious MCP Servers(34:30) Why Good Ideas are Now More Valuable Than Execution(35:30) Real-World AI Threats: Stolen API Keys & Smart Malware(40:15) The Challenge of Building Reliable Multi-Agent Systems(48:20) How AI is Lowering the Barrier for Reverse Engineering(50:30) "Vibe Investigating": Assisting the SOC with AI(54:15) Caleb's Personal AI Agent for Document OrganizationResources discussed during the call:NOVA- The Prompt Pattern MatchingDEF CON 33 Talk - Where’s My Crypto, Dude? The Ultimate Guide to Crypto Money Laundering
--------
1:02:02
--------
1:02:02
The Future of AI Security is Scaffolding, Agents & The Browser
Welcome to the 2025 State of AI Security. This year, the conversation has moved beyond simple prompt injection to a far more complex threat: attacking the entire ecosystem surrounding the LLM. In this deep-dive discussion, offensive security experts Jason Haddix (Arcanum Information Security) and Daniel Miessler (Unsupervised Learning) break down the real-world attack vectors they're seeing in the wild.The conversation explores why prompt injection remains an unsolved problem and how the LLM is now being used as a delivery system to attack internal developers and connected applications. We also tackle the critical challenge of incident response, questioning how you can detect or investigate a malicious prompt when privacy regulations in some regions prevent logging and observability.This episode is a must-listen for anyone looking to understand the true offensive and defensive landscape of AI security, from the DARPA Cyber Challenge to the race for AI to control the browser.Questions asked:(00:00) Introduction(02:22) Who are Jason Haddix & Daniel Miessler?(03:40) The State of AI Security in 2025(06:20) It's All About the "Scaffolding", Not Just the Model(08:30) Why Prompt Injection is a Fundamental, Unsolved Problem(10:45) "Attacking the Ecosystem": Using the LLM as a Delivery System(12:45) The New Enterprise Protocol: Prompts in English(15:10) The Incident Response Dilemma: How Do You Detect Malicious Prompts?(16:50) The Challenge of Logging: When Privacy Laws Block Observability(21:30) Has Data Poisoning Become a Major Threat?(27:20) How Far Can Autonomous AI Go in Hacking Today?(28:30) An Inside Look at the DARPA AI Cyber Challenge (AIxCC)(40:45) Are Attackers Actually Using AI in the Wild?(47:30) The Evolution of the "Script Kitty" in the Age of AI(51:00) Would AGI Solve Security? The Problem of Politics & Context(59:15) Context is King: Why Prompt Engineering is a Critical Skill(01:03:30) What are the Best LLMs for Security & Productivity?(01:05:40) The Next Frontier: Why AI is Racing to Own the Browser(01:20:20) Does Using AI to Write Content Erode Trust?
--------
1:24:46
--------
1:24:46
A CISO's Blueprint for AI Security (From ML to GenAI)
Is the current AI hype cycle different from the ones that failed before? How do you build a security program for technology that can't give the same answer twice? This episode features a deep-dive conversation with Damian Hasse, CISO of Moveworks and a security veteran from Amazon's Alexa team, VMware, and Microsoft.Damian provides a practical blueprint for securing both traditional Machine Learning (ML) and modern Generative AI (GenAI). We discuss the common pitfalls of newly formed AI Councils, where members may lack the necessary ML background to make informed decisions. He shares his framework for assessing AI risk by focusing on the specific use case, the data involved, and building a multi-layered defense against threats like prompt injection and data leakage.This is an essential guide for any security leader or practitioner tasked with navigating the complexities of AI security, from protecting intellectual property in AI-assisted coding to implementing safeguards for enterprise chatbots.Questions asked:(00:00) Introduction(02:31) Who is Damian Hasse? CISO at Moveworks(04:00) AI Security: The Difference Between the Pre-GPT and Post-GPT Eras(06:00) The Problem with New AI Councils Lacking ML Expertise(07:50) A History of AI: The Hype Cycles and Winters Since the 1950s(16:20) Is This AI Hype Cycle Different? The Power of Accessibility(20:25) Securing AI-Assisted Coding: IP Risks, Data Leakage, and Poisoned Models(23:30) The Threat of Indirect Prompt Injection in Open Source Packages(26:20) Are You Asking Your AI the Right Questions? The Power of "What Am I Missing?"(40:20) A CISO's Framework for Securing New AI Features(44:30) Building Practical Safeguards for Enterprise Chatbots(47:25) The Biggest Challenge in Real-Time AI Security: Performance(50:00) Why Access Control in AI is a Deterministic ProblemResources spoken about during the interviewTracing the thoughts of a large language model
The #1 source for AI Security insights for CISOs and cybersecurity leaders.
Hosted by two former CISOs, the AI Security Podcast provides expert, no-fluff discussions on the security of AI systems and the use of AI in Cybersecurity. Whether you're a CISO, security architect, engineer, or cyber leader, you'll find practical strategies, emerging risk analysis, and real-world implementations without the marketing noise.
These conversations are helping cybersecurity leaders make informed decisions and lead with confidence in the age of AI.