Lectures
You can download the lectures here. 🔔 Subscribe to our newsletter for the latest updates on LLMs!
-
2. Introduction to Language Models
Lecture date: August 04, 2025
tl;dr: Introduction to language modelling, RNNs, backpropagation through time, LSTMs, GRUs.
[ slides ] [ recording ]
Suggested Readings:
- Chapter-3, Speech and Language Processing
- Backpropagation Through Time: What It Does and How to Do It
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Learning long-term dependencies with gradient descent is difficult
- On the difficulty of training Recurrent Neural Networks
- Understanding LSTM Networks
-
5. Pre-training and Instruction Tuning
Lecture date: August 11, 2025
tl;dr: Pre-training strategies of Encoder-only, Encoder-decoder and Decoder-only models; Instruction Tuning, Weighted Instruction Tuning
[ slides ] [ recordingA ] [ recordingB ]
Suggested Readings:
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
- On the Effect of Instruction Tuning Loss on Generalization
- Instruction Tuning for Large Language Models: A Survey
-
9. RLHF: Part 03
Lecture date: August 21, 2025
tl;dr: GRPO, PPO, TRPO
[ slides ] [ recording ]
Suggested Readings:
- OpenAI Spinning Up (documentation and introductory blogs on Deep RL)
- Trust Region Policy Optimization
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Proximal Policy Optimization Algorithms
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (for GRPO)
- Training language models to follow instructions with human feedback
-
12. Efficient LLMs: Part 02
Lecture date: August 28, 2025
tl;dr: Efficient distributed training - data parallelism, ZeRO (1,2,3), FSDP
[ slides ] [ recordings ]
-
14. Efficient LLMs: Part 04
Lecture date: September 3, 2025
tl;dr: Pipeline Parallelism (AFAB, 1F1B), basics of GPU, Flash attention
[ slides ]
-
15. Efficient LLMs: Part 05
Lecture date: September 4, 2025
tl;dr: Training vs. Inference: Forward Pass, Inference, KV Cache usage and Management(vLLMs, KV Blocks, Paged Attention)
[ slides ]
-
16. Efficient LLMs: Part 06
Lecture date: September 8, 2025
tl;dr: Training vs. Inference(in code), Mixture of Experts Architecture, Efficient LLMs Recap
[ slides ]
-
17. Parameter-Efficient Fine-Tuning (PEFT)
Lecture date: September 10, 2025
tl;dr: Additive, Selective and Re-parameterization PEFT Techniques
[ slides ]
Suggested Readings:
- Parameter-Efficient Transfer Learning for NLP
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- The Power of Scale for Parameter-Efficient Prompt Tuning
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
- Training Neural Networks with Fixed Sparse Masks
- Parameter-Efficient Fine-Tuning without Introducing New Latency
- LoRA: Low-Rank Adaptation of Large Language Models
- AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
- DoRA: Weight-Decomposed Low-Rank Adaptation
- QLoRA: Efficient Finetuning of Quantized LLMs
- Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
- Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models
-
18. Model Compression
Lecture date: September 11, 2025
tl;dr: Different types of model pruning techniques
[ slides ]
Suggested Readings:
-
19. Knowledge Distillation
Lecture date: September 18, 2025
tl;dr: Different techniques for knowledge distillation in LLMs
[ slides ]
Suggested Readings:
- Distilling the Knowledge in a Neural Network
- Sequence-Level Knowledge Distillation
- MiniLLM: Knowledge Distillation of Large Language Models
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
- BERT Learns to Teach: Knowledge Distillation with Meta Learning
- A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation
- On the Generalization vs Fidelity Paradox in Knowledge Distillation
-
20. Retrieval-based LMs: Part 01
Lecture date: September 22, 2025
tl;dr: Motivation behind retrieval-augmented LMs, Retriever pipeline, different retrieval methods (sparse and dense)
[ slides ]
-
21. Retrieval-based LMs: Part 02
Lecture date: September 24, 2025
tl;dr: Cross-Encoder Reranking, Token level Dense Retrieval (COLBERT), Graph RAG, Hippo RAG
[ slides ]
-
22. LLM Agents
Lecture date: TBA
tl;dr: Discussion on how we can teach LLMs to use APIs and call appropriate functions when required, design decisions and protocols (like, MCP) for developing LLM-based agents.
-
22. Large Reasoning Models
Lecture date: TBA
tl;dr: Discussion on the post-training techniques for enhancing the reasoning capabilities of LLMs and developing reasoning models. Discussion on the paradigm of test-time scaling.
-
23. Multimodal Models
Lecture date: TBA
tl;dr: Discussion on the architecture and pre-training strategies of multimodal models mainly involving text and image modalities.
-
24. Alternative LLM Architectures
Lecture date: TBA
tl;dr: Discussion on the non-transformer-based LLM architectures: state space models, diffusion-based models, etc.
-
25. Physics of Language Models
Lecture date: TBA
tl;dr: Discussion on how LLMs store, extract, and manipulate knowledge, how this scales, why diverse pre-training makes facts extractable, and cases where models excel and struggle.
-
26. Interpreting the Inner Workings of LLMs
Lecture date: TBA
tl;dr: Discussion on various interpretability techniques to decipher the inner workings of LLMs.
-
27. Conclusion
Lecture date: TBA
tl;dr: Summary and discussion on current state of LLM research.