Powered by RND

AI + a16z

a16z
AI + a16z
Latest episode

Available Episodes

5 of 46
  • Beyond Leaderboards: LMArena’s Mission to Make AI Reliable
    LMArena cofounders Anastasios N. Angelopoulos, Wei-Lin Chiang, and Ion Stoica sit down with a16z general partner Anjney Midha to talk about the future of AI evaluation. As benchmarks struggle to keep up with the pace of real-world deployment, LMArena is reframing the problem: what if the best way to test AI models is to put them in front of millions of users and let them vote? The team discusses how Arena evolved from a research side project into a key part of the AI stack, why fresh and subjective data is crucial for reliability, and what it means to build a CI/CD pipeline for large models.They also explore:Why expert-only benchmarks are no longer enough.How user preferences reveal model capabilities — and their limits.What it takes to build personalized leaderboards and evaluation SDKs.Why real-time testing is foundational for mission-critical AI.Follow everyone on X:Anastasios N. AngelopoulosWei-Lin ChiangIon StoicaAnjney MidhaTimestamps0:04 -  LLM evaluation: From consumer chatbots to mission-critical systems6:04 -  Style and substance: Crowdsourcing expertise18:51 -  Building immunity to overfitting and gaming the system29:49 -  The roots of LMArena41:29 -   Proving the value of academic AI research48:28 -  Scaling LMArena and starting a company59:59 -  Benchmarks, evaluations, and the value of ranking LLMs1:12:13 -  The challenges of measuring AI reliability1:17:57 -  Expanding beyond binary rankings as models evolve1:28:07 -  A leaderboard for each prompt1:31:28 -  The LMArena roadmap1:34:29 -  The importance of open source and openness1:43:10 -  Adapting to agents (and other AI evolutions) Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
    --------  
    1:41:43
  • Building AI Systems You Can Trust
    In this episode of AI + a16z, Distributional cofounder and CEO Scott Clark, and a16z partner Matt Bornstein, explore why building trust in AI systems matters more than just optimizing performance metrics. From understanding the hidden complexities of generative AI behavior to addressing the challenges of reliability and consistency, they discuss how to confidently deploy AI in production. Why is trust becoming a critical factor in enterprise AI adoption? How do traditional performance metrics fail to capture crucial behavioral nuances in generative AI systems? Scott and Matt dive into these questions, examining non-deterministic outcomes, shifting model behaviors, and the growing importance of robust testing frameworks. Among other topics, they cover: The limitations of conventional AI evaluation methods and the need for behavioral testing. How centralized AI platforms help enterprises manage complexity and ensure responsible AI use. The rise of "shadow AI" and its implications for security and compliance. Practical strategies for scaling AI confidently from prototypes to real-world applications.Follow everyone:Scott ClarkDistributionalMatt BornsteinDerrick Harris Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
    --------  
    47:40
  • Who's Coding Now? AI and the Future of Software Development
    In this episode of the a16z AI podcast, a16z Infra partners Guido Appenzeller, Matt Bornstein, and Yoko Li explore how generative AI is reshaping software development. From its potential as a new high-level programming abstraction to its current practical impacts, they discuss whether AI coding tools will redefine what it means to be a developer.Why has coding emerged as one of AI's most powerful use cases? How much can AI truly boost developer productivity, and will it fundamentally change traditional computer science education? Guido, Yoko, and Matt dive deep into these questions, addressing the dynamics of "vibe coding," the enduring role of formal programming languages, and the critical challenge of managing non-deterministic behavior in AI-driven applications.Among other things, they discuss:The enormous market potential of AI-generated code, projected to deliver trillions in productivity gains.How "prompt-based programming" is evolving from Stack Overflow replacements into sophisticated development assistants.Why formal languages like Python and Java are here to stay, even as natural language interactions become common.The shifting landscape of programming education, and why understanding foundational abstractions remains essential.The unique complexities of integrating AI into enterprise software, from managing uncertainty to ensuring reliability. Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
    --------  
    44:30
  • MCP Co-Creator on the Next Wave of LLM Innovation
    In this episode of AI + a16z, Anthropic's David Soria Parra — who created MCP (Model Context Protocol) along with Justin Spahr-Summers — sits down with a16z's Yoko Li to discuss the project's inception, exciting use cases for connecting LLMs to external sources, and what's coming next for the project. If you're unfamiliar with the wildly popular MCP project, this edited passage from their discussion is a great starting point to learn:David: "MCP tries to enable building AI applications in such a way that they can be extended by everyone else that is not part of the original development team through these MCP servers, and really bring the workflows you care about, the things you want to do, to these AI applications. It's a protocol that just defines how whatever you are building as a developer for that integration piece, and that AI application, talk to each other. "It's a very boring specification, but what it enables is hopefully ... something that looks like the current API ecosystem, but for LLM interactions."Yoko: "I really love the analogy with the API ecosystem, because they give people a mental model of how the ecosystem evolves ... Before, you may have needed a different spec to query Salesforce versus query HubSpot. Now you can use similarly defined API schema to do that."And then when I saw MCP earlier in the year, it was very interesting in that it almost felt like a standard interface for the agent to interface with LLMs. It's like, 'What are the set of things that the agent wants to execute on that it has never seen before? What kind of context does it need to make these things happen?' When I tried it out, it was just super powerful and I no longer have to build one tool per client. I now can build just one MCP server, for example, for sending emails, and I use it for everything on Cursor, on Claude Desktop, on Goose."Learn more:A Deep Dive Into MCP and the Future of AI ToolingWhat Is an AI Agent?Benchmarking AI Agents on Full-Stack CodingAgent Experience: Building an Open Web for the AI EraFollow everyone on X:David Soria ParraYoko Li Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
    --------  
    53:39
  • What Is an AI Agent?
    In this episode of AI + a16z, a16z Infra partners Guido Appenzeller, Matt Bornstein, and Yoko Li discuss and debate one of the tech industry's buzziest words right now: AI agents. The trio digs into the topic from a number of angles, including:Whether a uniform definition of agent actually existsHow to distinguish between agents, LLMs, and functionsHow to think about pricing agentsWhether agents can actually replace humans, andThe effects of data siloes on agents that can access the web.They don't claim to have all the answers, but they raise many questions and insights that should interest anybody building, buying, and even marketing AI agents.Learn more:Benchmarking AI Agents on Full-Stack CodingAutomating Developer Email with MCP and Al AgentsA Deep Dive Into MCP and the Future of AI ToolingAgent Experience: Building an Open Web for the AI EraDeepSeek, Reasoning Models, and the Future of LLMsAgents, Lawyers, and LLMsReasoning Models Are Remaking Professional ServicesFrom NLP to LLMs: The Quest for a Reliable ChatbotCan AI Agents Finally Fix Customer Support?Follow everybody on X:Guido AppenzellerMatt BornsteinYoko Li Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
    --------  
    36:26

More Technology podcasts

About AI + a16z

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.
Podcast website

Listen to AI + a16z, How I AI and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

AI + a16z: Podcasts in Family

Social
v7.18.3 | © 2007-2025 radio.de GmbH
Generated: 6/1/2025 - 12:04:55 AM