Lectures
You can download the lectures here. 🔔 Subscribe to our newsletter for the latest updates on LLMs!
-
2. Introduction to Language Models
Lecture date: August 04, 2025
tl;dr: Introduction to language modelling, RNNs, backpropagation through time, LSTMs, GRUs.
[ slides ] [ recording ]
Suggested Readings:
- Chapter-3, Speech and Language Processing
- Backpropagation Through Time: What It Does and How to Do It
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Learning long-term dependencies with gradient descent is difficult
- On the difficulty of training Recurrent Neural Networks
- Understanding LSTM Networks
-
5. Pre-training and Instruction Tuning
Lecture date: August 11, 2025
tl;dr: Pre-training strategies of Encoder-only, Encoder-decoder and Decoder-only models; Instruction Tuning, Weighted Instruction Tuning
[ slides ] [ recordingA ] [ recordingB ]
Suggested Readings:
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
- On the Effect of Instruction Tuning Loss on Generalization
- Instruction Tuning for Large Language Models: A Survey
-
9. RLHF: Part 03
Lecture date: August 21, 2025
tl;dr: GRPO, PPO, TRPO
[ slides ] [ recording ]
Suggested Readings:
- OpenAI Spinning Up (documentation and introductory blogs on Deep RL)
- Trust Region Policy Optimization
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Proximal Policy Optimization Algorithms
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (for GRPO)
- Training language models to follow instructions with human feedback
-
12. Efficient LLMs: Part 02
Lecture date: August 28, 2025
tl;dr: Efficient distributed training - data parallelism, ZeRO (1,2,3), FSDP
[ slides ] [ recordings ]
-
17. Parameter-Efficient Fine-Tuning (PEFT)
Lecture date: September 10, 2025
tl;dr: Additive, Selective and Re-parameterization PEFT Techniques
[ slides ] [ video ]
Suggested Readings:
- Parameter-Efficient Transfer Learning for NLP
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- The Power of Scale for Parameter-Efficient Prompt Tuning
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
- Training Neural Networks with Fixed Sparse Masks
- Parameter-Efficient Fine-Tuning without Introducing New Latency
- LoRA: Low-Rank Adaptation of Large Language Models
- AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
- DoRA: Weight-Decomposed Low-Rank Adaptation
- QLoRA: Efficient Finetuning of Quantized LLMs
- Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
- Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models
-
19. Knowledge Distillation
Lecture date: September 18, 2025
tl;dr: Different techniques for knowledge distillation in LLMs
[ slides ] [ video ]
Suggested Readings:
- Distilling the Knowledge in a Neural Network
- Sequence-Level Knowledge Distillation
- MiniLLM: Knowledge Distillation of Large Language Models
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
- BERT Learns to Teach: Knowledge Distillation with Meta Learning
- A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation
- On the Generalization vs Fidelity Paradox in Knowledge Distillation
-
23. LLMs and Tools: Teaching LLMs to Use External APIs
Lecture date: October 6, 2025
tl;dr: Teaching LLMs to use external APIs - APIBench, ToolAlpaca, ToolBench, APIGen, Granite-Function Calling Model
[ slides ] [ video ]
Suggested Readings:
- Gorilla: Large Language Model Connected with Massive APIs
- ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
- APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
- Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
-
27. LLM Reasoning: Part 03 (Test-Time Scaling)
Lecture date: October 15, 2025
tl;dr: Scaling test-time compute with reasoning models - parallel scaling, sequential scaling, hybrid scaling, internal scaling
[ slides ] [ video ]
Suggested Readings:
- A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
- Self-Refine: Iterative Refinement with Self-Feedback
- s1: Simple test-time scaling
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- OpenAI o1 System Card
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- First Finish Search: Efficient Test-Time Scaling in Large Language Models
- Position: Enough of Scaling LLMs! Lets Focus on Downscaling
-
29. LLMs and Tools: Agentic Workflow: Part 02
Lecture date: October 22, 2025
tl;dr: Memory management in AI agents, enabling a small-sized LLM to approach larger proprietary model performance
[ slides ]
-
30. LLMs and Tools: Agentic Workflow: Part 03
Lecture date: October 23, 2025
tl;dr: Agentic Protocols - Model Context Protocol (MCP), Agent2Agent (A2A) Protocol, agents.json
[ slides ]
-
31. Alternative Models: RWKV
Lecture date: October 25, 2025
tl;dr: Receptance Weighted Key Value (RWKV)
[ slides ]
Suggested Readings:
-
32. Alternative Models: SSMs
Lecture date: October 27, 2025
tl;dr: State Space Machines (SSMs)
[ slides ]
Suggested Readings:
-
33. Multimodal Encoder Models
Lecture date: October 29, 2025
tl;dr: Multimodal encoder models - ViT, VisualBERT, VilBERT, CLIP, LayoutLMv2, ViT with registers, DinoV3, VideoCLIP, ImageBind
[ slides ]
Suggested Readings:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Learning Transferable Visual Models From Natural Language Supervision
- LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
- Vision Transformers Need Registers
- DINOv3
- VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
- IMAGEBIND: One Embedding Space To Bind Them All
-
34. Text Generation with Multimodal Inputs
Lecture date: October 30, 2025
tl;dr: Multimodal text generation architectures - Frozen, Flamingo, BLIP, BLIP2, mPLUG, LLaVa, Video-LLaMA, MiniGPT-4, MiniCPM-V, UI-TARS
[ slides ]
Suggested Readings:
- Visual Instruction Tuning
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- MiniCPM-V: A GPT-4V Level MLLM on Your Phone
- Flamingo: a Visual Language Model for Few-Shot Learning
- UI-TARS: Pioneering Automated GUI Interaction with Native Agents
-
36. Physics of LLMs: Knowledge Storage and Extraction
Lecture date: November 6, 2025
tl;dr: Understanding why augmented pre-training data is essential for LLM knowledge extraction
[ slides ]
Suggested Readings:
-
37. Physics of LLMs: Knowledge Manipulation, Knowledge Capacity Scaling Laws and Reasoning
Lecture date: November 10, 2025
tl;dr: Understanding LLM reasoning, knowledge manipulation, and capacity scaling laws
[ slides ]
Suggested Readings:
-
38. Interpretability of LLMs
Lecture date: November 12, 2025
tl;dr: Local and Global Explanation-based Analysis, Sparse Autoencoders, Activation patching and steering
[ slides ]
