ELL884 | LCS-IITD

You can download the lectures here. We will try to upload lectures prior to their corresponding classes.

Course Introduction
tl;dr: An introduction to the course syllabus, timeline and background.
[slides]
Regular expressions & morphology
tl;dr: Terminologies, regular expressions, morphology, Porter stemmer
[slides]
Minimum Edit Distance & Statistical Language Modeling
tl;dr: Edit distance table, backtracking, probabilistic language modeling, n-grams, smoothing
[slides (Edit Distance)] [slides (N-grams)]
Advanced Smoothing Techniques and Evaluation of Language Models
tl;dr: Backoff and interpolation; Good-Turing; Kneser-Ney; Shanon Game; perplexity and entropy
[slides]
Supplementary Material:
- Language Models: Advanced Smoothing & Evaluation
- Perplexity and Entropy: Jursfsky and Martin Page 59
POS Tagging and Hidden Markov Model
tl;dr: Intro to POS tagging, rule-based methods, Markov chain, Intro to HMM, Forward algorithm, Viterbi algorithm
[slides (POS Tagging)] [slides (HMM)]
Hidden Markov Model
tl;dr: Evaluation, learning and decoding
[scribe]
Supplementary Material:
- Jurafsky and Martin
Parsing
tl;dr: Introduction to syntactic parsing, constituency vs dependency parsing, CFG
[slides (Statistical Parsing)] [slides (CFG)]
Lexical Semantics
tl;dr: Word relations, WordNet and word similarity
[slides]
Distributional Semantics
tl;dr: Word similarity, TF-TDF, distributional similarity
[slides]
Word Representation
tl;dr: Various methods for representing words as vectors – count-based methods, prediction-based methods (Word2vec, fastText)
[slides]
Supplementary Material:
- Textbook: Chapter 3 – Introduction to Large Language Models, Tanmoy Chakraborty
Word Representation-II and Neural Language Models
tl;dr: Co-occurrence matrices and GloVe; word vector properties; historical word embeddings; bias detection, fixed-window neural language models; transition to recurrent architectures (RNNs, LSTMs)
[slides (Word Representation)] [slides (Neural Language Models)]
Supplementary Material:
- Textbook: Chapter 3 – Introduction to Large Language Models, Tanmoy Chakraborty
- Textbook: Chapter 5 – Introduction to Large Language Models, Tanmoy Chakraborty
Neural Language Models
tl;dr: RNNs, training RNNs, backpropagation through time
[slides]
Supplementary Material:
RNNs and Seq-to-Seq Attention
tl;dr: vanishing gradient problem; LSTMs and GRUs; bidirectional and multi-layer RNNs; neural machine translation; seq2seq models; beam search decoding, top-k sampling
[slides (RNNs)] [slides (Seq-to-Seq Attention)]
Seq-to-Seq Attention and Transformers
tl;dr: seq2seq attention; variants of attention; introduction to positional encoding
[slides (Seq-to-Seq Attention)] [slides (Transformers)]
Supplementary Material:
Transformers - Positional Encodings
tl;dr: sinusoidal positional encoding, rotary positional encoding, batch and layer normalization
[slides]
Supplementary Material
Pretraining Strategies
tl;dr: ELMo architecture, pretraining of BERT, masked language modeling and next sentence prediction
[slides]
Supplementary Material
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Pretraining and Tokenization Strategies
tl;dr: Pre-Training BART, T5, GPT and LLaMa family, causal masking, sub-word tokenization, Byte-Pair Encoding, wordPiece tokenization, unigram language model tokenization
[slides (Pretraining Strategies)] [slides (Tokenization Strategies)]
Supplementary Material
Prompt-Based Learning
tl;dr: zero-shot, few-shot and in-context Learning, prompt sensitivity, prefix tuning, Chain-of-Thought, Tree-of-Thought, Graph-of-Thought, POSIX, instruction tuning
[slides]
Supplementary Material
Instruction Tuning
tl;dr: Discussion on instruction tuning of LLMs – data collection, loss function, and properties of tuned models.
[slides]
Supplementary Material
Alignment of Language Models
tl;dr: LLM training stages and alignment; limits of instruction tuning; RLHF with human or AI feedback; reward model using Bradley-Terry preferences; REINFORCE and gradient tricks; Q-function and advantage estimation; PPO for stable policy optimization.
[slides]
Supplementary Material:
Alignment of Language Models - DPO
tl;dr: DPO objective; Parametric vs non-parametric setup; KL regularization; Bias in DPO; Online vs offline DPO; Key takeaways
[slides] [scribe]
Supplementary Material:
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Guest Lecture - Retrieval-Augmented Generation (Dinesh Raghu)
tl;dr: Intro to RAG; Closed-book vs open-book LLMs; Hallucinations & retriever failure; PEFT & LoRA; RAFT and domain adaptation issues; Context & paraphrase augmentation; Tool-calling with LLMs
[slides]
Supplementary Material:
Tool Augmentation with LLMs
tl;dr: Intro to tool-augmented LLMs; Toolformer for API usage; Limits of current tool-calling; SyReLM for symbolic solver coordination; Adapter and LoRA finetuning; DaSLaM decomposer-solver model; Reward functions for decomposition; Future of modular and tool-augmented LLMs
[slides]
Supplementary Material:
Knowledge Editing
tl;dr: Intro to knowledge editing; Problems with full finetuning; Knowledge triplet updates; Conditions for reliable, generalized, localized edits; KE method; GRACE cache-based updates; ROME mid-layer memory editing; Evaluation on factual correction and retention
[slides]
Supplementary Material:
Responsible LLMs & Conclusion
tl;dr: Responsible LLMs – Explainability, fairness, robustness, safety; Bias in LLMs – visibility, sources, impact; Bias mitigation via adversarial triggers and in-context learning; Final course summary; Future directions; NLP applications
[slides]
Supplementary Material: