Building eval systems that improve your AI product
If you’re a premium subscriber, add the private feed to your podcast app at https://add.lennysreads.comIn this episode, we dive into the fast-emerging discipline of AI evaluation with Hamel Husain and Shreya Shankar, creators of AI Evals for Engineers & PMs, the #1 highest-grossing course on Maven.After training 2000+ PMs and engineers across 500+ companies, Hamel and Shreya reveal the complete playbook for building evaluations that actually improve your AI product: moving beyond vanity dashboards, to a system that drives continuous improvement.In this episode, you’ll learn:• Why most AI eval dashboards fail to deliver real product improvements• How to use error analysis to uncover your product’s most critical failure modes• The role of a “principal domain expert” in setting a consistent quality bar• Techniques for transforming messy error notes into a clean taxonomy of failures• When to use code-based checks vs. LLM-as-a-judge evaluators• How to build trust in your evals with human-labeled ground-truth datasets• Why binary pass/fail labels outperform Likert scales in practice• Evaluation strategies for complex systems: multi-turn conversations, RAG pipelines, and agentic workflows• How CI safety nets and production monitoring work together to create a flywheel of continuous product improvementReferences:• Read the newsletter: https://www.lennysnewsletter.com/p/building-eval-systems-that-improve• AI Evals for Engineers & PMs: https://maven.com/parlance-labs/evals• A Field Guide to Rapidly Improving AI Products: https://hamel.dev/blog/posts/field-guide/• Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences: https://arxiv.org/abs/2404.12272• Aman Khan: https://www.linkedin.com/in/amanberkeley/• Anthropic: https://www.anthropic.com/• Arize Phoenix: https://phoenix.arize.com/• Braintrust: https://www.braintrust.dev/• Beyond vibe checks: A PM’s complete guide to evals: https://www.lennysnewsletter.com/p/beyond-vibe-checks-a-pms-complete• Frequently Asked Questions (And Answers) About AI Evals: https://hamel.dev/blog/posts/evals-faq/• Hamel Husain: https://www.linkedin.com/in/hamelhusain/• LangSmith: https://smith.langchain.com/• Not Dead Yet: On RAG: https://hamel.dev/notes/llm/rag/not_dead.html• OpenAI: https://openai.com/• Shreya Shankar: https://www.linkedin.com/in/shrshnk/Listen:• YouTube: https://www.youtube.com/@lennysreads• Apple: https://podcasts.apple.com/us/podcast/lennys-reads/id1810314693• Spotify: https://open.spotify.com/show/0IIunA06qMtrcQLfypTooj• Newsletter: https://www.lennysnewsletter.com/subscribeFollow Lenny:• Twitter/X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/• Podcast: https://www.youtube.com/@lennyspodcastSubscribe• YouTube: https://www.youtube.com/@lennysreads• Apple: https://podcasts.apple.com/us/podcast/lennys-reads/id1810314693• Spotify: https://open.spotify.com/show/0IIunA06qMtrcQLfypTooj• Substack: https://lennysreads.com/Follow Lenny• Twitter: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/• Podcast: https://www.youtube.com/@lennyspodcastAboutWelcome to Lenny's Reads, where every week you’ll find a fresh audio version of my newsletter about building product, driving growth, and accelerating your career, read to you by the soothing voice of Lennybot. To hear more, visit www.lennysnewsletter.com
--------
21:41
--------
21:41
How to find the perfect name
In this episode, naming expert David Placek shares his foundational principles that helped him create some of the world’s most iconic brands: BlackBerry, Azure, Sonos, and Impossible Foods, to name a few.Whether you’re launching a new company or naming a product, this episode will sharpen how you think about brand strategy and help you avoid costly mistakes.If you’re a premium subscriberAdd the private feed to your podcast app at add.lennysreads.com: https://add.lennysreads.com/Listen now: YouTube: https://www.youtube.com/@lennysreadsApple: https://podcasts.apple.com/us/podcast/lennys-reads/id1810314693Spotify: https://open.spotify.com/show/0IIunA06qMtrcQLfypToojIn this episode, you’ll learn:• Why naming is the single highest-leverage brand decision you’ll make• The 3 pillars of effective names• How invented names outperform descriptive ones (and cost less to build)• Why brainstorming rarely works - and what to do instead• The story behind Intel’s Pentium and how one name changed the industry• Why “Swiffer” made mopping fun• How Vercel went from “Zeit” to a name built for momentum• The 6 challenges every name must overcome in today’s AI-driven world• What not to do: common naming traps that lead to forgettable brandsReferences:• AMD: https://www.amd.com/• Azure: https://azure.microsoft.com/• BlackBerry: https://www.blackberry.com/• Codeium: https://codeium.com/• Dell: https://www.dell.com/• HP: https://www.hp.com/• Impossible Foods: https://impossiblefoods.com/• Intel: https://www.intel.com/• Lexicon Branding: https://www.lexiconbranding.com/• Microsoft: https://www.microsoft.com/• Navan: https://navan.com/• Procter & Gamble: https://us.pg.com/• Sonos: https://www.sonos.com/• Subaru: https://www.subaru.com/• Vercel: https://vercel.com/ To hear more, visit www.lennysnewsletter.com
--------
11:16
--------
11:16
Why your AI product needs a different development lifecycle
In this episode, Aishwarya Reganti and Kiriti Badam introduce a powerful new framework for building AI products: the Continuous Calibration/Continuous Development (CC/CD) framework.If you’ve ever shipped an AI demo that looked magical but struggled to scale, this framework will resonate. The CC/CD framework provides a practical and structured approach to navigating these realities and building AI systems that are stable, intentional, and trustworthy.Listen: YouTube: https://www.youtube.com/@lennysreadsApple: https://podcasts.apple.com/us/podcast/lennys-reads/id1810314693Spotify: https://open.spotify.com/show/0IIunA06qMtrcQLfypToojNewsletter: https://www.lennysnewsletter.com/subscribeIn this episode, you’ll learn:• The two core differences every AI builder must account for: non-determinism and the agency–control tradeoff• The six phases of the CC/CD loop• How to scope capabilities across different versions of your product to gradually earn trust and agency• The role of reference datasets in taming unpredictability and guiding evals• How to design application-specific evals that act as the equivalent of tests for AI• Why control handoffs are essential for maintaining user trust• How to transform deployment from a finish line into the start of continuous calibration• Why you should never “jump to full agency” before the system earns it• How to apply the CC/CD loop with real-world examples: customer support, marketing assistants, and coding copilotsReferences:• Beyond vibe checks: A PM’s complete guide to evals: https://www.lennysnewsletter.com/p/beyond-vibe-checks-a-pms-complete• “Building agentic AI applications with a problem-first approach” [Maven course]: https://maven.com/aishwarya-kiriti/genai-system-design• Cursor: https://cursor.com• “Don’t build AI products like traditional software” [Free lightning talk]: https://maven.com/p/88a325/don-t-build-ai-products-like-traditional-software• GitHub Copilot: https://github.com/features/copilot• Why premature pptimization is the root of all evil: https://stackify.com/premature-optimization-evil/• Why your AI product needs a different development lifecycle: https://www.lennysnewsletter.com/p/48e7fb02-1fc1-4bb7-bf85-85f4165e8225 To hear more, visit www.lennysnewsletter.com
--------
13:32
--------
13:32
Essential reading for product builders—part 2
If you’re a premium subscriberAdd the private feed to your podcast app at add.lennysreads.comIn this episode, I share another 10 must-read essays that continue to shape how I think about product, startups, and career.Whether you’re scaling a startup, leading a team, or sharpening your thinking, this episode will expand your perspective and give you practical tools you can use immediately.Listen now:YouTube: https://www.youtube.com/@lennysreadsApple: https://podcasts.apple.com/us/podcast/lennys-reads/id1810314693Spotify: https://open.spotify.com/show/0IIunA06qMtrcQLfypToojSubstack: https://lennysreads.com/In this episode, you’ll learn:• Why solving the right problem for the right audience is obvious - yet essential• Why leaders must take full responsibility for any communication discrepancies• How to use the SCQA (Situation, Complication, Question, Answer) framework for concise, persuasive messaging• Why sales, often overlooked, is critical to your product’s success• A tremendously clarifying way to think about your market• The four types of “fit” every startup needs• What we get wrong about our definition of a startup• How to start giving away your Legos• How reframing what you’re selling can strengthen your positioning• What “Schlep blindness” is - and how founders can avoid itReferences:References:• Building Products: https://medium.com/the-year-of-the-looking-glass/building-products-91aa93bea4bb• Communication is the Job: https://boz.com/articles/communication-is-the-job• Distribution: https://a16z.com/distribution/• Eigenquestions: The Art of Framing Problems: https://coda.io/@shishir/eigenquestions-the-art-of-framing-problems• Executive Communication: https://www.heavybit.com/library/video/executive-communication/• Give Away Your Legos: https://review.firstround.com/give-away-your-legos-and-other-commandments-for-scaling-startups/• How to Work with Designers: https://medium.com/the-year-of-the-looking-glass/how-to-work-with-designers-6c975dede146• Product Management Mental Models for Everyone: https://blackboxofpm.com/product-management-mental-models-for-everyone-31e7828cb50b• Schlep Blindness: https://www.paulgraham.com/schlep.html• Startup = Growth: https://paulgraham.com/growth.html• The Four Fits: https://brianbalfour.com/four-fits-growth-framework• The Market Curve: https://medium.com/sequoia-capital/the-market-curve-44097b626f6d• The Next Feature Fallacy: https://andrewchen.com/the-next-feature-fallacy-the-fallacy-that-the-next-new-feature-will-suddenly-make-people-use-your-product/• We Don’t Sell Saddles Here: https://medium.com/@stewart/we-dont-sell-saddles-here-4c59524d650d• What Makes a Strong Product Culture?: https://www.bringthedonuts.com/essays/what-makes-a-strong-product-culture/ To hear more, visit www.lennysnewsletter.com
--------
5:31
--------
5:31
25 proven tactics to accelerate AI adoption at your company
If you're a premium subscriber, get the full episodes in your podcast feed by visiting https://add.lennysreads.comRead the full post: https://www.lennysnewsletter.com/p/c674cb15-93c5-44f0-9a11-33ca626b8179Peter Yang interviewed leaders at six of the most AI-forward companies—Zapier, Ramp, Duolingo, Shopify, Intercom, and Whoop—to uncover 25 real-world tactics for driving employee AI adoption. If your team is still struggling to get value from AI tools, this episode is packed with practical advice, internal playbooks, and strategies you can use right away.In this episode, you’ll learn• The five steps to driving AI adoption at your company• Why vague AI-first mandates don’t work—and what to do instead• How to track and reward adoption, usage, and outcomes• How to cut through red tape and unblock company-wide access• How to turn AI power users into internal teachers• Which high-impact tasks top companies are automating first• What separates real AI adoption from flashy demo theaterReferences• Gallup poll: https://www.gallup.com/workplace/691643/work-nearly-doubled-two-years.aspx• Hilary Gridley’s 30 days of GPT: https://docs.google.com/spreadsheets/d/1zJ4rbi9YcQuGqGxc6-AQD0-44oT9l4Eyono0AdpgJbA/edit?gid=0#gid=0• How Zapier measures AI fluency: https://x.com/wadefoster/status/1930680089651425452• Tobi Lütke’s AI memo: https://x.com/tobi/status/1909251946235437514?lang=en• Zapier’s code red playbook: https://docs.google.com/document/d/1zbkjL0d-ev87PO0yKJBBMlliByVaUIREKNQfvtVwXbg/ To hear more, visit www.lennysnewsletter.com