Cool Stuff to Work Through
- GPU Puzzles
- LLM Puzzles
- Handcrafted Transformers
- Tensor Puzzles
- Transformer Puzzles
- Autodiff Puzzles
- Prompting Puzzle
- Thinking Like a Transformer
- Putting the You in CPU
- Typing Practice
- Regex Crossword
- ML Interviews
- TinyVector
- Getting Started with Generative AI
- Numbers Every Engineer Should Know
- Numbers Every LLM Developer Should Know
- Memorising Numbers Matters
- Looking Inside LLAMA
- Top K Via a Loop
- ARENA Prerequisites
- 100 Numpy Exercises
AI Big Picture
- Language Models Rely On Abstractions
- Loss Landscapes are All You Need
- DPO Instead of RLHF
- Training in 1L Transformers
- Effects of AI on Financial Markets
- Neural Networks 33 Years From Now
- Diffusion Language Models
- Training LLMs on 100 Million Words
- Training Smaller LLMs on More Tokens
- We Aren’t Close to Self-Improving AI
- Why Do Tree-Based Models Outperform DL on Tabular Data
- Reinforcement Learning From Human Feedback
- Learning Agile Soccer Skills
- RoboCat
- Reinforced Self-Training for Language Modeling
- Artificial Visual Cortex
- InstructGPT
- AI Market Structure
- An Observation on Generalization
- Any Deep ReLU is Shallow
- Inference-Time Intervention
- MLPs Are All You Need
- How Does Safety Fail?
- Teaching Arithmetic to Transformers
- How To Make LLMs Say True Things
- Discovering Latent Knowledge
- The Shape of AGI
- Why TAI is Hard to Achieve
- GPT is Good at Emotion
- Instructions as Backdoors
- Let’s Verify Step By Step
- Inspecting and Editing Knowledge Representations
- Small In-Context Learning
- Exploring KQV with Intention
- Meta RL for Language
- DCF of Semicaps
AI Explainers
- Visualising Transformers
- Exploring Autonomous Agents
- Transformer Illustration
- AI Canon
- We Have No Moat
- OpenAI and Google Have Moats
- What Are Embeddings?
- LLMs Explained
- Understanding Self Attention
- Taxonomy of Transformers
- A Catalogue of Transformers
- A Survey of LLMs
- The GPT Series of LLMs
- Optimal Compression in Self Supervised Learning
- Three Mechanisms of Weight Decay Regularisation
- Prompt Engineering
- Attention Approximates Sparse Memory
- RLHF as Divergence
- In-Context Learning
- Tensors for NNs
- Reinforcement Learning as Fine-Tuning
- Understanding Tokenisers
- Explaining Copilot
AI Interpretability
- Backpack Models
- Bilinear Layers
- What Is Linearity
- Automated Circuit Discovery
- Textbooks Are All You Need
- Locating and Editing Factual Associations in GPT
- Mass Editing Memory in a Transformer
- Guide to Mechanistic Interpretability
- Research Walkthrough About GPT-J
- Evolution of Representations in Transformers
- Transformer Feedforward Layers
- Transformers in Embedding Space
- Towards Transparent AI
- Inductive Biases With Transfer Learning
- A Toy Model of Universality
- SolidGoldMagikarp
- Intermediate Activations in LLAMA
- Finding Neurons in a Haystack
- Steering GPT2-XL
AI Scaling
- Transformers Learn Shortcuts to Automata
- Zero Shot, Few Shot, Retrieval, Finetuning
- The Case Against the Singularity
- Optimizing Data Mixtures
- Replacing LLM-DBs with VectorDBs
- TinyStories
- Diminishing Returns in ML
- How Fast Can We Do a Forward Pass
- Limits on Capabilities
- Compute Based Framework
- In-Context Learning in Large Models
- Compute At Scale
- Model Collapse with Generated Data
- Compute and AI
- The Cost of Training SOTA Models
- It’s Not Going to Stop
- Memorise or Generalise
- Transformer Maths 101
- Compute Trends
- Parameter Counting for Transformers
- Cost to Train the Next Claude
- Improving the Performance of DL Models
- Making PyTorch Faster
- Scaling, Emergence and Reasoning
- Scaling Laws in LLMs
- The Scaling Hypothesis
- Double Descent in Humans
- The Semi Supply Chain
- Recurrent Memory Transformer
- Local Inference is Expensive
- How Good Are H100s?
- Navigating the High Cost of Compute
- Inverse Scaling Can Be U-Shaped
- PAC Bayes Compression Bounds Explain Generalisation
- Inductive Biases in ML
- RWKV
- From Deep to Long Learning
- Examples of AI Improving AI
- Limits on Transformers for Composability
- Adversarial Robustness of Foundation Models
- Adversarial Examples in CLIP
- Inverse Scaling
- How Quickly is AI Advancing?
- Chris Miller on Semis
- BERT on 1 GPU in 1 Day
- Toolformer
- Broken Neural Scaling Laws
- Adversarial Image Manipulations Influence Humans Too
- Why LLAMA is Possible
- Reducing Sycophancy in LLMs
- Multi-Head vs Multi-Attention
- Limits to Flash Attention
- Supply and Demand of H100s
- Language Model Behavior at Reduced Scale
- How Much Time and Money to Train LLMs
- Scaling Laws with Board Games
- Unlimiformer
AI in Use
- GPT and LangChain Chatbot For Reading PDFs
- Text to SQL
- Training Stable Diffusion from Scratch
- How to Train Your Own LLM
- Finetuning with LoRA
- Deep Learning Tuning Playbook
- Deploying nanoGPT
- GPT Trained on PaulG
- Learning French App with GPT
- More Apps with GPT
- Movie Recommendation with Embeddings
- Starting an AI Business
- AI Wearable
- AutoAnki
- Youtube TLDR Analyser
- Wikipedia vec2text
- Quiz Me GPT
Random Not AI
- Not Just the Poker Table
- How I Think When I Think About Programming
- Streamlit and Plotly Express
- What is Information?
- How Technology Grows?
- The Weapons That Win Wars
- Breakthroughs in Deep Tech
- Chip War, The Prize, The World For Sale
- Thinking Physics but for Chemistry/Cooking
- Why is Nuclear Power Expensive
- Longevity Biotech