ELL881/AIL821 | LCS2-IITD

You can download the lectures here. 🔔 Subscribe to our newsletter for the latest updates on LLMs!

1. Course Introduction
Lecture date: July 22, 2024 (Monday)
tl;dr: An introduction to the course content, logistics, policies and background.
[slides] [video]
Suggested Readings:
- A Survey of Deep Learning: From Activations to Transformers
2. Introduction to Natural Language Processing
Lecture date: July 24, 2024 (Wednesday)
tl;dr: An introduction to the terminologies and problems in NLP.
[slides] [video]
Suggested Readings:
- The Evolution of NLP: Past, Present, and Future
3.1. Introduction to Language Models: Statistical Language Modeling
Lecture date: July 25, 2024 (Thursday)
tl;dr: An introduction to statistical language modeling.
[slides] [video]
Suggested Readings:
- Chapter-3, Speech and Language Processing
3.2. Introduction to Language Models: Advanced Smoothing and Evaluation
Lecture date: July 29, 2024 (Monday)
tl;dr: Discussion on the advanced smoothing techniques, and the intrinsic and extrinsic methods for evaluation of language models.
[slides] [video]
Suggested Readings:
- Chapter-3, Speech and Language Processing
4.1. Word Representation: Word2vec
Lecture date: July 31, 2024 (Wednesday)
tl;dr: Various methods for representing words as vectors – count-based methods, prediction-based methods (Word2vec, fastText).
[slides] [scribe] [video]
Suggested Readings:
4.2. Word Representation: GloVe
Lecture date: August 1, 2024 (Thursday)
tl;dr: Combining count-based and prediction-based methods for learning word representations – GloVe.
[slides] [scribe] [video]
Suggested Readings:
- GloVe: Global Vectors for Word Representation
Optional Readings on Bias Captured by Word Embeddings:
5.1. Neural Language Models: RNNs
Lecture date: August 1, 2024 (Thursday)
tl;dr: Language modeling using different variants of neural networks (CNN, RNN). A quick overview of the training algorithm of RNNs – Backpropagation through time.
[slides] [scribe] [video]
Suggested Readings:
5.2. Neural Language Models: LSTMs and GRUs
Lecture date: August 2, 2024 (Friday)
tl;dr: Problems with RNNs – vanishing gradients. Architectural modifications for solving vanishing gradient problem – LSTMs and GRUs.
[slides] [scribe] [video]
Suggested Readings:
5.3. Neural Language Models: Sequence-to-Sequence and Attention
Lecture date: August 2, 2024 (Friday)
tl;dr: Neural network architectures for modeling sequence-to-sequence problems. Introduction to attention mechanism in RNNs.
[slides] [scribe] [video]
Suggested Readings:
6.1. Introduction to Transformer: Self-Attention and Multi-Head Attention
Lecture date: August 5, 2024 (Monday)
tl;dr: Getting rid of recurrent connections – self-attention, multi-head attention, masked decoding. Introduction to positional encoding.
[slides] [video]
Suggested Readings:
6.2. Introduction to Transformer: Positional Encoding and Layer Normalization
Lecture date: August 7, 2024 (Wednesday)
tl;dr: Discussion on various types on positional encoding methods (Absolute Positional Encoding, Relative Positional Encoding, Rotary Positional Encoding). Understanding Layer Normalization.
[slides] [scribe] [video]
Suggested Readings:
7. Pre-training Strategies
Lecture date: August 8, 2024 (Thursday)
tl;dr: Discussion on ELMo. Understanding the pre-training strategies of encoder-only transformer (BERT) – Masked Language Modeling.
[slides] [video]
Suggested Readings:
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
8.1. Advanced Attention Mechanisms-I
Lecture date: August 21, 2024 (Wednesday)
tl;dr: Understanding KV Caching. Discussion on the efficient attention mechanisms – multi-query attention, grouped query attention, sliding window attention.
[slides] [scribe] [video]
Suggested Readings:
8.2. Advanced Attention Mechanisms-II
Lecture date: August 22, 2024 (Thursday)
tl;dr: Understanding Flash Attention.
[slides] [video]
Suggested Readings:
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
9. Tokenization Strategies
Lecture date: August 26, 2024 (Monday)
tl;dr: Understanding different techniques for tokenization – Byte-Pair Encoding (BPE), WordPiece, Unigram tokenization.
[slides] [video]
Suggested Readings:
10.1. Mixture of Experts-I
Lecture date: August 28, 2024 (Wednesday)
tl;dr: Discussion on the Mixture of Experts (MoE) architectural paradigm – Sparse MoE, Routing Mechanisms and Learning Dynamics.
[slides] [video]
Suggested Readings:
10.2. Mixture of Experts-II
Lecture date: August 29, 2024 (Thursday)
tl;dr: Understanding the architecture of Switch Transformers, Mixtral. Discussion on Model Parallelism.
[slides] [scribe] [video]
Suggested Readings:
11. Scaling Laws
Lecture date: August 31, 2024 (Saturday)
tl;dr: Discussion on emergent abilities of LLMs. Understanding the empirical scaling laws for neural language model performance on the cross-entropy loss – Kaplan laws, Chinchilla scaling laws. Discussion on an alternative perspective on emergent abilities – are these abilities really 'emergent'?
[slides] [scribe] [video]
Suggested Readings:
12.1. Pre-training of Causal LMs and In-context Learning
Lecture date: September 2, 2024 (Monday)
tl;dr: Looking into the procedure for pre-training of causal/auto-regressive language models. Discussion on the in-context learning ability of LLMs.
[slides] [scribe] [video]
Suggested Readings:
12.2. Instruction Tuning
Lecture date: September 4, 2024 (Wednesday)
tl;dr: Discussion on instruction tuning of LLMs – data collection, loss function, and properties of tuned models.
[slides] [scribe] [video]
Suggested Readings:
13.1. Alignment of Language Models: Reward Maximization-I
Lecture date: September 5, 2024 (Thursday)
tl;dr: Looking into the reward model for alignment – modeling the alignment procedure as reinforcement learning, the architecture of reward model, training the reward model, gathering preference data (RLHF vs RLAIF), reward maximization objective.
[slides] [scribe] [video]
13.2. Alignment of Language Models: Reward Maximization-II
Lecture date: September 7, 2024 (Saturday)
tl;dr: Looking into different algorithms for training the policy model, which is the LLM, to maximize the reward – REINFORCE, PPO.
[slides] [scribe] [video]
Suggested Readings:
13.3. Alignment of Language Models: Contrastive Learning
Lecture date: September 9, 2024 (Monday)
tl;dr: Discussion on the Direct Preference Optimization (DPO) algorithm.
[slides] [scribe] [video]
Suggested Readings:
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Direct Language Model Alignment from Online AI Feedback
14.1. Parameter Efficient Fine-Tuning (PEFT)
Lecture date: September 19, 2024 (Thursday)
tl;dr: Discussion on various PEFT techniques – prompt tuning, prefix tuning, adapters, low-rank adaptation (LoRA).
[slides] [video]
Suggested Readings:
14.2. Quantization, Pruning & Distillation
Lecture date: September 23, 2024 (Monday)
tl;dr: Discussion on various model compression techniques – post-training quantization, QLoRA, magnitude and structured pruning, knowledge distillation.
[slides] [video]
Suggested Readings:
15.1. Efficient LLM Decoding-I
Lecture date: September 25, 2024 (Wednesday)
tl;dr: Discussion on various efficient inference/decoding techniques – KV caching, paged attention and vLLM.
[slides] [scribe] [video]
Suggested Readings:
- LLM Inference Serving: Survey of Recent Advances and Opportunities
- Efficient Memory Management for Large Language Model Serving with PagedAttention
15.2. Efficient LLM Decoding-II
Lecture date: September 26, 2024 (Thursday)
tl;dr: Discussion on various efficient decoding techniques – flash decoding, speculative decoding, Medusa and tree attention, prompt-lookup decoding, lookahead decoding.
[slides] [scribe] [video]
Suggested Readings:
16.1. Retrieval-based Language Models-I
Lecture date: September 30, 2024 (Monday)
tl;dr: Discussion on the motivation behind retrieval-based LMs and various retrieval methods – sparse and dense retrieval, cross-encoder reranking, differentiable search index, table-of-contents aware search.
[slides] [video]
Suggested Readings:
16.2. Retrieval-based Language Models-II
Lecture date: October 3, 2024 (Thursday)
tl;dr: Looking into various retrieval-based LMs – kNN LM, RETRO, REALM, RAG. Discussion on different training methods for retrieval-augmented LMs and their limitations.
[slides] [scribe] [video]
Suggested Readings:
17.1. Multimodal Models-I
Lecture date: October 7, 2024 (Monday)
tl;dr: Understanding the architecture and pre-training strategies of multimodal models – the focus of this lecture is on multimodal understanding involving two modalities (image and text).
[slides] [video]
Suggested Readings:
17.2. Multimodal Models-II
Lecture date: October 14, 2024 (Monday)
tl;dr: Discussion on text generation with multimodal inputs.
[slides] [video]
Suggested Readings:
18.1. LLMs and Tools: Tool Augmentation
Lecture date: October 16, 2024 (Wednesday)
tl;dr: Understanding how we can enable pre-trained language models to use external tools and incorporate tool usage during fine-tuning.
[slides] [video]
Suggested Readings:
18.2. LLMs and Tools: Function Calling
Lecture date: October 17, 2024 (Thursday)
tl;dr: Discussion on how we can teach LLMs to use APIs and call appropriate functions when required.
[slides] [scribe] [video]
Suggested Readings:
18.3. LLMs and Tools: Agentic Workflow
Lecture date: October 19, 2024 (Saturday)
tl;dr: Discussion on how we can automate complex, multi-step tasks – developing LLM-based agents.
[slides] [scribe] [video]
Suggested Readings:
19. Reasoning in LLMs
Lecture date: October 21, 2024 (Monday)
tl;dr: Looking into different types of reasoning tasks and various techniques (like, Chain-of-Thought prompting, backward chaining, etc.) facilitating LLMs to solve these tasks. Overview of various reasoning benchmarks and discussion on whether LLMs can truly reason and plan, highlighting both current capabilities and limitations.
[slides] [video]
Suggested Readings:
20. Long Context LLMs: Challenges & Solutions
Lecture date: October 28, 2024 (Monday)
tl;dr: Discussion on the challenges LLMs face while processing long input contexts and various techniques to enable LLMs to handle long contexts effectively.
[slides] [video]
Suggested Readings:
21. Knowledge Editing
Lecture date: October 30, 2024 (Wednesday)
tl;dr: Discussion on some of the techniques for editing 'knowledge' stored in LLMs without further re-training.
[slides] [video01] [video02] [video03] [video04]
Suggested Readings:
22. Self-evolving LLMs
Lecture date: November 7, 2024 (Thursday)
tl;dr: Discussion on self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences generated by the model itself.
[slides] [scribe] [video]
Suggested Readings:
23. An Alternate Formulation of Transformers
Lecture date: November 11, 2024 (Monday)
tl;dr: Understanding the residual stream perspective of Transformers and decomposing the role of each model component.
[scribe] [video]
Suggested Readings:
- A Mathematical Framework for Transformer Circuits
- A Primer on the Inner Workings of Transformer-based Language Models
24. Interpretability: Demystifying the Black-Box LMs
Lecture date: November 13, 2024 (Wednesday)
tl;dr: Discussion on various interpretability techniques to decipher the inner workings of LLMs.
[slides] [scribe]
Suggested Readings: