Context Engineering for Agents - Lance Martin, LangChain
Lance: https://www.linkedin.com/in/lance-martin-64a33b5/
How Context Fails: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html
How New Buzzwords Get Created: https://www.dbreunig.com/2025/07/24/why-the-term-context-engineering-matters.html
Content Engineering: https://x.com/RLanceMartin/status/1948441848978309358 https://rlancemartin.github.io/2025/06/23/context_engineering/ https://docs.google.com/presentation/d/16aaXLu40GugY-kOpqDU4e-S0hD1FmHcNyF0rRRnb1OU/edit?usp=sharing
Manus Post: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus
Cognition Post: https://cognition.ai/blog/dont-build-multi-agents
Multi-Agent Researcher: https://www.anthropic.com/engineering/multi-agent-research-system
Human-in-the-loop + Memory: https://github.com/langchain-ai/agents-from-scratch
- Bitter Lesson in AI Engineering -
Hyung Won Chung on the Bitter Lesson in AI Research: https://www.youtube.com/watch?v=orDKvo8h71o
Bitter Lesson w/ Claude Code: https://www.youtube.com/watch?v=Lue8K2jqfKk&t=1s
Learning the Bitter Lesson in AI Engineering: https://rlancemartin.github.io/2025/07/30/bitter_lesson/
Open Deep Research: https://github.com/langchain-ai/open_deep_research https://academy.langchain.com/courses/deep-research-with-langgraph
Scaling and building things that "don't yet work": https://www.youtube.com/watch?v=p8Jx4qvDoSo
- Frameworks -
Roast framework at Shopify / standardization of orchestration tools: https://www.youtube.com/watch?v=0NHCyq8bBcM
MCP adoption within Anthropic / standardization of protocols: https://www.youtube.com/watch?v=xlEQ6Y3WNNI
How to think about frameworks: https://blog.langchain.com/how-to-think-about-agent-frameworks/
RAG benchmarking: https://rlancemartin.github.io/2025/04/03/vibe-code/
Simon's talk with memory-gone-wrong: https://simonwillison.net/2025/Jun/6/six-months-in-llms/
--------
--------
A Technical History of Generative Media
Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy!
00:00 - Introductions
04:58 - History of Major AI Models and Their Impact on Fal.ai
07:06 - Pivoting to Generative Media and Strategic Business Decisions
10:46 - Technical discussion on CUDA optimization and kernel development
12:42 - Inference Engine Architecture and Kernel Reusability
14:59 - Performance Gains and Latency Trade-offs
15:50 - Discussion of model latency importance and performance optimization
17:56 - Importance of Latency and User Engagement
18:46 - Impact of Open Source Model Releases and Competitive Advantage
19:00 - Partnerships with closed source model developers
20:06 - Collaborations with Closed-Source Model Providers
21:28 - Serving Audio Models and Infrastructure Scalability
22:29 - Serverless GPU infrastructure and technical stack
23:52 - GPU Prioritization: H100s and Blackwell Optimization
25:00 - Discussion on ASICs vs. General Purpose GPUs
26:10 - Architectural Trends: MMDiTs and Model Innovation
27:35 - Rise and Decline of Distillation and Consistency Models
28:15 - Draft Mode and Streaming in Image Generation Workflows
29:46 - Generative Video Models and the Role of Latency
30:14 - Auto-Regressive Image Models and Industry Reactions
31:35 - Discussion of OpenAI's Sora and competition in video generation
34:44 - World Models and Creative Applications in Games and Movies
35:27 - Video Models’ Revenue Share and Open-Source Contributions
36:40 - Rise of Chinese Labs and Partnerships
38:03 - Top Trending Models on Hugging Face and ByteDance's Role
39:29 - Monetization Strategies for Open Models
40:48 - Usage Distribution and Model Turnover on FAL
42:11 - Revenue Share vs. Open Model Usage Optimization
42:47 - Moderation and NSFW Content on the Platform
44:03 - Advertising as a key use case for generative media
45:37 - Generative Video in Startup Marketing and Virality
46:56 - LoRA Usage and Fine-Tuning Popularity
47:17 - LoRA ecosystem and fine-tuning discussion
49:25 - Post-Training of Video Models and Future of Fine-Tuning
50:21 - ComfyUI Pipelines and Workflow Complexity
52:31 - Requests for startups and future opportunities in the space
53:33 - Data Collection and RedPajama-Style Initiatives for Media Models
53:46 - RL for Image and Video Models: Unknown Potential
55:11 - Requests for Models: Editing and Conversational Video Models
57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing
58:23 - Bitter Lesson and the Future of Model Workflows
58:44 - FAL's hiring approach and team structure
59:29 - Team Structure and Scaling Applied ML and Performance Teams
1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration
1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks
1:04:02 - Closing Remarks and Culture at FAL
--------
--------
Better Data is All You Need — Ari Morcos, Datology
Our chat with Ari shows that data curation is the most impactful and underinvested area in AI. He argues that the prevailing focus on model architecture and compute scaling overlooks the "bitter lesson" that "models are what they eat." Effective data curation—a sophisticated process involving filtering, rebalancing, sequencing (curriculum), and synthetic data generation—allows for training models that are simultaneously faster, better, and smaller. Morcos recounts his personal journey from focusing on model-centric inductive biases to realizing that data quality is the primary lever for breaking the diminishing returns of naive scaling laws. Datology's mission is to automate this complex curation process, making state-of-the-art data accessible to any organization and enabling a new paradigm of AI development where data efficiency, not just raw scale, drives progress.
Timestamps
00:00 Introduction
00:46 What is Datology? The mission to train models faster, better, and smaller through data curation.
01:59 Ari's background: From neuroscience to realizing the "Bitter Lesson" of AI.
05:30 Key Insight: Inductive biases from architecture become less important and even harmful as data scale increases.
08:08 Thesis: Data is the most underinvested area of AI research relative to its impact.
10:15 Why data work is culturally undervalued in research and industry.
12:19 How self-supervised learning changed everything, moving from a data-scarce to a data-abundant regime.
17:05 Why automated curation is superior to human-in-the-loop, citing the DCLM study.
19:22 The "Elephants vs. Dogs" analogy for managing data redundancy and complexity.
22:46 A brief history and commentary on key datasets (Common Crawl, GitHub, Books3).
26:24 Breaking naive scaling laws by improving data quality to maintain high marginal information gain.
29:07 Datology's demonstrated impact: Achieving baseline performance 12x faster.
34:19 The business of data: Datology's moat and its relationship with open-source datasets.
39:12 Synthetic Data Explain
ed: The difference between risky "net-new" creation and powerful "rephrasing."
49:02 The Resurgence of Curriculum Learning: Why ordering data matters in the underfitting regime.
52:55 The Future of Training: Optimizing pre-training data to make post-training more effective.
54:49 Who is training their own models and why (Sovereign AI, large enterprises).
57:24 "Train Smaller": Why inference cost makes smaller, specialized models the ultimate goal for enterprises.
01:00:19 The problem with model pruning and why data-side solutions are complementary.
01:03:03 On finding the smallest possible model for a given capability.
01:06:49 Key learnings from the RC foundation model collaboration, proving that data curation "stacks."
01:09:46 Lightning Round: What data everyone wants & who should work at Datology.
01:14:24 Commentary on Meta's superintelligence efforts and Yann LeCun's role.
--------
--------
Long Live Context Engineering - with Jeff Huber of Chroma
Jeff Huber of Chroma joins us to talk about what actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows.
Full show notes: https://www.latent.space/p/chroma
00:00 Introductions
00:48 Why Build Chroma
02:55 Information Retrieval vs. Search
04:29 Staying Focused in a Competitive AI Market
08:08 Building Chroma Cloud
12:15 Context Engineering and the Problems with RAG
16:11 Context Rot
21:49 Prioritizing Context Quality
27:02 Code Indexing and Retrieval Strategies
32:04 Chunk Rewriting and Query Optimization for Code
34:07 Transformer Architecture Evolution and Retrieval Systems
38:06 Memory as a Benefit of Context Engineering
40:13 Structuring AI Memory and Offline Compaction
45:46 Lessons from Previous Startups and Building with Purpose
47:32 Religion and Values in Silicon Valley
50:18 Company Culture, Design, and Brand Consistency
52:36 Hiring at Chroma: Designers, Researchers, and Engineers
--------
--------
Greg Brockman on OpenAI's Road to AGI
Greg Brockman, co-founder and president of OpenAI, joins us to talk about GPT-5 and GPT-OSS, the future of software engineering, why reinforcement learning is still scaling, and how OpenAI is planning to get to AGI.
00:00 Introductions
01:04 The Evolution of Reasoning at OpenAI
04:01 Online vs Offline Learning in Language Models
06:44 Sample Efficiency and Human Curation in Reinforcement Learning
08:16 Scaling Compute and Supercritical Learning
13:21 Wall clock time limitations in RL and real-world interactions
16:34 Experience with ARC Institute and DNA neural networks
19:33 Defining the GPT-5 Era
22:46 Evaluating Model Intelligence and Task Difficulty
25:06 Practical Advice for Developers Using GPT-5
31:48 Model Specs
37:21 Challenges in RL Preferences (e.g., try/catch)
39:13 Model Routing and Hybrid Architectures in GPT-5
43:58 GPT-5 pricing and compute efficiency improvements
46:04 Self-Improving Coding Agents and Tool Usage
49:11 On-Device Models and Local vs Remote Agent Systems
51:34 Engineering at OpenAI and Leveraging LLMs
54:16 Structuring Codebases and Teams for AI Optimization
55:27 The Value of Engineers in the Age of AGI
58:42 Current state of AI research and lab diversity
01:01:11 OpenAI’s Prioritization and Focus Areas
01:03:05 Advice for Founders: It's Not Too Late
01:04:20 Future outlook and closing thoughts
01:04:33 Time Capsule to 2045: Future of Compute and Abundance
01:07:07 Time Capsule to 2005: More Problems Will Emerge
The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space