- A friend’s list of high signal-to-noise pages
- Reproducibility crisis as fraud
- Married, conservative, college graduates who trust the system and others are happy
- Mesaoptimisation in transformers
- Reducing PPO memory in RLHF
- Visualising with TorchLens
- Why LLMs work so well
- Prediction is insufficient for making money
- A 7 point recipe for single managers
- Transformer model inference optimisation
- Training large models on many GPUs
- Training stability of transformers
- Drone racing with deep RL
- Speculative execution for LLMs
- Transformers learning to find GCDs
- Inverting embeddings
- Transformers as SVMs
- Transformers being bottlenecked by memory bandwidth, not quadratic complexity
- The real Jensen’s law