High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753
In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling process. Hung details his team's work on SwiftBrush and SwiftEdit, which enable high-quality text-to-image generation and editing in a single inference step. He explains their novel distillation framework, where a multi-step teacher model guides the training of an efficient, single-step student model. We explore the architecture and training, including the use of a secondary 'coach' network that aligns the student's denoising function with the teacher's, allowing the model to bypass the iterative process entirely. Finally, we discuss how these efficiency breakthroughs pave the way for personalized on-device agents and the challenges of running reasoning models with techniques like inference-time scaling under a fixed compute budget.
The complete show notes for this episode can be found at https://twimlai.com/go/753.
--------
52:23
--------
52:23
Vibe Coding's Uncanny Valley with Alexandre Pesant - #752
Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software development from typing characters to expressing intent, creating a new layer of abstraction similar to how high-level code compiles to machine code. We explore the current capabilities and limitations of coding agents, the importance of context engineering, and the practices that separate successful vibe coders from frustrated ones. Alex also shares Lovable’s technical journey, from an early, complex agent architecture that failed, to a simpler workflow-based system, and back again to an agentic approach as foundation models improved. He also details the company's massive scaling challenges—like accidentally taking down GitHub—and makes the case for why robust evaluations and more expressive user interfaces are the most critical components for AI-native development tools to succeed in the near future.
The complete show notes for this episode can be found at https://twimlai.com/go/752.
--------
1:12:36
--------
1:12:36
Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750
Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to overcome them, including windowed attention, grouped query attention, and latent space attention. We explore the idea of weight-state balance and the weight-state FLOP ratio as a way of reasoning about the optimality of compute architectures, and we dig into the Power Retention architecture, which blends the parallelization of attention with the linear scaling of recurrence and promises speedups of >10x during training and >100x during inference. We review Manifest AI’s recent open source projects as well: Vidrial—a custom CUDA framework for building highly optimized GPU kernels in Python, and PowerCoder—a 3B-parameter coding model fine-tuned from StarCoder to use power retention. Our chat also covers the use of metrics like in-context learning curves and negative log likelihood to measure context utility, the implications of scaling laws, and the future of long context lengths in AI applications.
The complete show notes for this episode can be found at https://twimlai.com/go/750.
--------
57:23
--------
57:23
The Decentralized Future of Private AI with Illia Polosukhin - #749
In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-owned AI. Illia shares his unique journey from developing the Transformer architecture at Google to building the NEAR Protocol blockchain to solve global payment challenges, and now applying those decentralized principles back to AI. We explore how Near AI is creating a decentralized cloud that leverages confidential computing, secure enclaves, and the blockchain to protect both user data and proprietary model weights. Illia also shares his three-part approach to fostering trust: open model training to eliminate hidden biases and "sleeper agents," verifiability of inference to ensure the model runs as intended, and formal verification at the invocation layer to enforce composable guarantees on AI agent actions. Finally, Illia shares his perspective on the future of open research, the role of tokenized incentive models, and the need for formal verification in building compliance and user trust.
The complete show notes for this episode can be found at https://twimlai.com/go/749.
--------
1:05:03
--------
1:05:03
Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748
Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images.
The complete show notes for this episode can be found at https://twimlai.com/go/748.
About The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, deep learning, natural language processing, neural networks, analytics, computer science, data science and more.
Listen to The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence), Pivot and many other podcasts from around the world with the radio.net app