Tuesday Jan 06, 2026
Module 2: The MLP Layer - Where Transformers Store Knowledge
Shay explains where a transformer actually stores knowledge: not in attention, but in the MLP (feed-forward) layer. The episode frames the transformer block as a two-step loop: attention moves information between tokens, then the MLP transforms each token’s representation independently to inject learned knowledge.
Version: 20241125
No comments yet. Be the first to say something!