The AI Concepts Podcast

The AI Concepts Podcast is my attempt to turn the complex world of artificial intelligence into bite-sized, easy-to-digest episodes. Imagine a space where you can pick any AI topic and immediately grasp it, like flipping through an Audio Lexicon - but even better! Using vivid analogies and storytelling, I guide you through intricate ideas, helping you create mental images that stick. Whether you’re a tech enthusiast, business leader, technologist or just curious, my episodes bridge the gap between cutting-edge AI and everyday understanding. Dive in and let your imagination bring these concepts to life!

Listen on:

Episodes

Jun 11, 2026

Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All

Jun 11, 2026

8 min

This episode closes out Module 6 by tackling the question that has been getting louder since large context windows arrived. If a model can hold hundreds of thousands or even millions of tokens at once, do we still need all the architecture we just spent this module building? We explore why RAG was never just about fitting text into a small prompt, what retrieval is actually doing that a large context window cannot, and how the shift from compression to curation changes what good RAG looks like today. We cover when long context is genuinely the better tool, when retrieval still matters deeply, and why in most real enterprise systems the best answer is both working together. The episode closes with the argument that RAG is not disappearing. It is maturing. And everything we built in this module is part of that stronger foundation. By the end you will have a clear and honest picture of where these two approaches fit, and why understanding both puts you well ahead of most people working in this space.

Jun 9, 2026

Module 6: RAG | GraphRAG - When Relationships Matter More Than Text

Jun 9, 2026

8 min

This episode addresses the category of questions that vector search fundamentally cannot answer, questions about relationships between things. We explore what a knowledge graph is and why traversing connections between entities requires a completely different data structure than semantic similarity search. We break down Microsoft's GraphRAG approach, how it extracts entities and relationships from documents during indexing, uses community detection to identify clusters of related knowledge, and generates summaries that enable global queries across an entire corpus rather than just local document retrieval. We cover the cost improvements brought by LazyGraphRAG, the hybrid vector-plus-graph pattern most production teams are moving toward, Neo4j as the go-to graph database, and a lighter-weight entity extraction approach for teams not ready for a full knowledge graph. By the end you will understand when relationships matter more than text and how to build systems that can answer both kinds of questions.

Jun 9, 2026

Module 6: RAG | Query Transformation - When the Question Is the Bottleneck

Jun 9, 2026

7 min

This episode addresses a retrieval failure that has nothing to do with your index and everything to do with the query itself. We explore the vocabulary gap between how people ask questions and how documents are written, and why even strong embedding models cannot always bridge it. We break down three techniques that fix the query before the search runs: query rewriting to reformulate casual language into formal search terms, HyDE which generates a hypothetical answer and uses that as the search query instead of the question, and multi-query expansion which generates multiple phrasings to cast a wider retrieval net. We also cover step-back prompting for queries that need broader conceptual grounding before searching. By the end you will understand why the question itself is often the highest-leverage thing to improve in a retrieval pipeline.

Jun 9, 2026

Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big

Jun 9, 2026

7 min

This episode addresses the fundamental tension between retrieval precision and generation context. We explore why small chunks produce tight embeddings that retrieve well but leave the model without enough surrounding information, and why large chunks give the model context but dilute the embedding and hurt search quality. We break down parent-child indexing as the solution that decouples these two problems entirely, how child chunks handle the search and parent chunks handle the generation, and how to structure the hierarchy for documents of different complexity. We cover practical implementations in LlamaIndex and LangChain and close with guidance on when this pattern earns its place in a pipeline. By the end you will understand how to stop choosing between finding the right thing and giving the model enough to work with.

Jun 9, 2026

Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

Jun 9, 2026

9 min

This episode addresses the gap between finding candidate chunks and finding the right ones. We explore the bi-encoder bottleneck, why compressing text into a single vector for comparison loses critical nuance, and how cross-encoders fix this by reading the query and document together in a single forward pass. We introduce ColBERT as a powerful middle ground between speed and accuracy through token-level late interaction, walk through the production tooling landscape including Cohere Rerank, BGE models, and RAGatouille, and close by stitching hybrid search and reranking into a complete three-stage retrieval funnel. By the end you will understand why two-stage retrieval is now the standard architecture for any serious RAG pipeline.

Jun 9, 2026

Module 6: RAG | Dense and Sparse Search - Why Vector Search Alone Is Not Enough

Jun 9, 2026

11 min

This episode addresses one of the most common gaps in RAG pipelines, relying solely on semantic search. We explore how dense retrieval works and where it excels, then introduce sparse retrieval with BM25 and why it catches what vector search misses entirely, particularly exact identifiers like part numbers, codes, and proper nouns. We break down how hybrid search combines both approaches using Reciprocal Rank Fusion, why it consistently outperforms either method alone, and how modern vector databases like Weaviate, Pinecone, and Qdrant support this natively. By the end you will understand why the best retrieval systems are not choosing between semantic and keyword search but running both.

Apr 29, 2026

Module 6: RAG | Chunking - Where You Cut Decides What Gets Found

Apr 29, 2026

10 min

This episode is about chunking, the quiet step in a RAG pipeline that decides whether your system retrieves the right answer or a confidently wrong one. It covers why the chunk is the real unit of retrieval, the tradeoff between context and precision, the main strategies teams use to split documents, and why testing your chunks against real questions matters more than picking the perfect size.

Apr 27, 2026

Module 6: RAG | Data Ingestion - Before Your Documents Can Be Found

Apr 27, 2026

11 min

This episode is about the step that every RAG system depends on. Before meaning can be stored or retrieved, your raw documents have to become clean text. What goes wrong here breaks the entire pipeline in ways that are surprisingly hard to catch.

Apr 27, 2026

Module 6: RAG | Vector Databases - Where That Meaning Gets Stored

Apr 27, 2026

10 min

This episode is about the infrastructure underneath every RAG system. It covers the purpose-built engine that stores all that meaning and searches millions of vectors in milliseconds, in a way no traditional database can. This is what makes retrieval fast enough to actually work in production.

Apr 27, 2026

Module 6: RAG | Embeddings - Teaching Machines to Understand Meaning

Apr 27, 2026

8 min

This episode is about the layer of RAG that makes semantic search possible. It covers how machines turn language into math that clusters similar ideas together, so a question and its answer can find each other even when they share no words in common. Without this, RAG is just keyword search with extra steps.