The AI Concepts Podcast
The AI Concepts Podcast is my attempt to turn the complex world of artificial intelligence into bite-sized, easy-to-digest episodes. Imagine a space where you can pick any AI topic and immediately grasp it, like flipping through an Audio Lexicon - but even better! Using vivid analogies and storytelling, I guide you through intricate ideas, helping you create mental images that stick. Whether you’re a tech enthusiast, business leader, technologist or just curious, my episodes bridge the gap between cutting-edge AI and everyday understanding. Dive in and let your imagination bring these concepts to life!
The AI Concepts Podcast is my attempt to turn the complex world of artificial intelligence into bite-sized, easy-to-digest episodes. Imagine a space where you can pick any AI topic and immediately grasp it, like flipping through an Audio Lexicon - but even better! Using vivid analogies and storytelling, I guide you through intricate ideas, helping you create mental images that stick. Whether you’re a tech enthusiast, business leader, technologist or just curious, my episodes bridge the gap between cutting-edge AI and everyday understanding. Dive in and let your imagination bring these concepts to life!
Episodes
Monday Jan 05, 2026
Module 2: The Encoder (BERT) vs. The Decoder (GPT)
Monday Jan 05, 2026
Monday Jan 05, 2026
Shay breaks down the encoder vs decoder split in transformers: encoders (BERT) read the full text with bidirectional attention to understand meaning, while decoders (GPT) generate text one token at a time using causal attention.
She ties the architecture to training (masked-word prediction vs next-token prediction), explains why decoder-only models dominate today (they can both interpret prompts and generate efficiently with KV caching), and previews the next episode on the MLP layer, where most learned knowledge lives.
Monday Jan 05, 2026
Module 2: Multi Head Attention & Positional Encodings
Monday Jan 05, 2026
Monday Jan 05, 2026
Shay explains multi-head attention and positional encodings: how transformers run multiple parallel attention 'heads' that specialize, why we concatenate their outputs, and how positional encodings reintroduce word order into parallel processing.
The episode uses clear analogies (lawyer, engineer, accountant), highlights GPU efficiency, and previews the next episode on encoder vs decoder architectures.
Saturday Jan 03, 2026
Module 2: Inside the Transformer -The Math That Makes Attention Work
Saturday Jan 03, 2026
Saturday Jan 03, 2026
In this episode, Shay walks through the transformer's attention mechanism in plain terms: how token embeddings are projected into queries, keys, and values; how dot products measure similarity; why scaling and softmax produce stable weights; and how weighted sums create context-enriched token vectors.
The episode previews multi-head attention (multiple perspectives in parallel) and ends with a short encouragement to take a small step toward your goals.
Saturday Jan 03, 2026
Module 2: Attention Is All You Need (The Concept)
Saturday Jan 03, 2026
Saturday Jan 03, 2026
Shay breaks down the 2017 paper "Attention Is All You Need" and introduces the transformer: a non-recurrent architecture that uses self-attention to process entire sequences in parallel.
The episode explains positional encoding, how self-attention creates context-aware token representations, the three key advantages over RNNs (parallelization, global receptive field, and precise signal mixing), the quadratic computational trade-off, and teases a follow-up episode that will dive into the math behind attention.
Saturday Jan 03, 2026
Saturday Jan 03, 2026
Shay breaks down why recurrent neural networks (RNNs) struggled with long-range dependencies in language: fixed-size hidden states and the vanishing gradient caused models to forget early context in long texts.
He explains how LSTMs added gates (forget, input, output) to manage memory and improve short-term performance but remained serial, creating a training and scaling bottleneck that prevented using massive parallel compute.
The episode frames this fundamental bottleneck in NLP and sets up the next episode on attention, ending with a brief reflection on persistence and steady effort.
Friday Dec 12, 2025
Module 1: Tokens - How Models Really Read
Friday Dec 12, 2025
Friday Dec 12, 2025
This episode dives into the hidden layer where language stops being words and becomes numbers. We explore what tokens actually are, how tokenization breaks text into meaningful fragments, and why this design choice quietly shapes a model’s strengths, limits, and quirks. Once you understand tokens, you start seeing why language models sometimes feel brilliant and sometimes strangely blind.
Friday Dec 12, 2025
Module 1: The Autoregressive Assumption | How Language Emerges in AI
Friday Dec 12, 2025
Friday Dec 12, 2025
This episode explores the hidden engine behind how language models move from knowing to creating. It reveals why generation happens step by step, why speed has hard limits, and why training and usage behave so differently. Once you see this mechanism, the way models write, reason, and sometimes stall will make immediate sense.
Friday Dec 12, 2025
Module 1: The Latent Space & Manifolds | How Models Encode Meaning
Friday Dec 12, 2025
Friday Dec 12, 2025
This episode is about the hidden space where generative models organize meaning. We move from raw data into a compressed representation that captures concepts rather than pixels or tokens, and we explore how models learn to navigate that space to create realistic outputs. Understanding this idea explains both the power of generative AI and why it sometimes fails in surprising ways.
Friday Dec 12, 2025
Module 1: The Generative Turn (Discriminative vs. Generative)
Friday Dec 12, 2025
Friday Dec 12, 2025
Welcome to Episode One of The Generative Shift. This episode introduces the core change behind modern AI, the move from discriminative models that draw decision boundaries to generative models that learn the full structure of data. Instead of predicting labels using conditional probability, generative systems model the joint distribution itself, which allows them to create rather than classify. This shift reshapes the math, the architecture, and the compute requirements, moving from compression focused networks to expansion driven systems that grow structure from noise. It is harder and more expensive, but it is the foundation of everything that follows. In the next episode, we will explore where this expansion lives by stepping into latent space and understanding how models represent meaning itself.
Friday Dec 12, 2025
Intro to The Generative AI Series
Friday Dec 12, 2025
Friday Dec 12, 2025
Hello everyone, and welcome to The Generative AI Series. I’m Shay, and this introductory episode is about why this series exists and who it is for. Generative AI has exploded, but real understanding is still scattered. Between hype, shortcuts, and surface level strategy talk, it is hard to find a clear path from fundamentals to building systems that actually work. This series is for practitioners, builders, architects, and technical leaders who want to understand how these models work under the hood, why they succeed, and why they fail. We will go deep but stay accessible, moving step by step from the shift from classification to generation, through transformers, training, RAG, evaluation, and production realities. The goal is simple: build intuition, recognize failure modes early, and design solutions and strategies that work beyond demos, in the real world. Let’s get started. I’ll see you in Module One.




