Lectures
You can download the lectures here. We will try to upload lectures prior to their corresponding classes.
-
Course Introduction
tl;dr: An introduction to the course syllabus, timeline and background.
[slides]
-
Regular expressions & morphology
tl;dr: Terminologies, regular expressions, morphology, Porter stemmer
[slides]
-
Minimum Edit Distance & Statistical Language Modeling
tl;dr: Edit distance table, backtracking, probabilistic language modeling, n-grams, smoothing
[slides (Edit Distance)] [slides (N-grams)]
-
Advanced Smoothing Techniques and Evaluation of Language Models
tl;dr: Backoff and interpolation; Good-Turing; Kneser-Ney; Shanon Game; perplexity and entropy
[slides]
Supplementary Material:
-
POS Tagging and Hidden Markov Model
tl;dr: Intro to POS tagging, rule-based methods, Markov chain, Intro to HMM, Forward algorithm, Viterbi algorithm
[slides (POS Tagging)] [slides (HMM)]
-
-
Parsing
tl;dr: Introduction to syntactic parsing, constituency vs dependency parsing, CFG
[slides (Statistical Parsing)] [slides (CFG)]
-
-
-
Word Representation
tl;dr: Various methods for representing words as vectors – count-based methods, prediction-based methods (Word2vec, fastText)
[slides]
Supplementary Material:
-
Word Representation-II and Neural Language Models
tl;dr: Co-occurrence matrices and GloVe; word vector properties; historical word embeddings; bias detection, fixed-window neural language models; transition to recurrent architectures (RNNs, LSTMs)
[slides (Word Representation)] [slides (Neural Language Models)]
-
-
RNNs and Seq-to-Seq Attention
tl;dr: vanishing gradient problem; LSTMs and GRUs; bidirectional and multi-layer RNNs; neural machine translation; seq2seq models; beam search decoding, top-k sampling
[slides (RNNs)] [slides (Seq-to-Seq Attention)]
-
Seq-to-Seq Attention and Transformers
tl;dr: seq2seq attention; variants of attention; introduction to positional encoding
[slides (Seq-to-Seq Attention)] [slides (Transformers)]
-
Transformers - Positional Encodings
tl;dr: sinusoidal positional encoding, rotary positional encoding, batch and layer normalization
[slides]
-
Pretraining Strategies
tl;dr: ELMo architecture, pretraining of BERT, masked language modeling and next sentence prediction
[slides]
-
Pretraining and Tokenization Strategies
tl;dr: Pre-Training BART, T5, GPT and LLaMa family, causal masking, sub-word tokenization, Byte-Pair Encoding, wordPiece tokenization, unigram language model tokenization
[slides (Pretraining Strategies)] [slides (Tokenization Strategies)]
Supplementary Material
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Neural Machine Translation of Rare Words with Subword Units
- Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
-
Prompt-Based Learning
tl;dr: zero-shot, few-shot and in-context Learning, prompt sensitivity, prefix tuning, Chain-of-Thought, Tree-of-Thought, Graph-of-Thought, POSIX, instruction tuning
[slides]
-
Instruction Tuning
tl;dr: Discussion on instruction tuning of LLMs – data collection, loss function, and properties of tuned models.
[slides]
-
Alignment of Language Models
tl;dr: LLM training stages and alignment; limits of instruction tuning; RLHF with human or AI feedback; reward model using Bradley-Terry preferences; REINFORCE and gradient tricks; Q-function and advantage estimation; PPO for stable policy optimization.
[slides]
-
Guest Lecture - Retrieval-Augmented Generation (Dinesh Raghu)
tl;dr: Intro to RAG; Closed-book vs open-book LLMs; Hallucinations & retriever failure; PEFT & LoRA; RAFT and domain adaptation issues; Context & paraphrase augmentation; Tool-calling with LLMs
[slides]
Supplementary Material:
- RAGAS: Automated Evaluation of Retrieval Augmented Generation
- Atlas: Few-shot Learning with Retrieval Augmented Language Models
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- NEAREST NEIGHBOR MACHINE TRANSLATION
- A Survey on Retrieval-Augmented Text Generation
- Augmented Language Models: a Survey
- REALM: Retrieval-Augmented Language Model Pre-Training
-
Tool Augmentation with LLMs
tl;dr: Intro to tool-augmented LLMs; Toolformer for API usage; Limits of current tool-calling; SyReLM for symbolic solver coordination; Adapter and LoRA finetuning; DaSLaM decomposer-solver model; Reward functions for decomposition; Future of modular and tool-augmented LLMs
[slides]
Supplementary Material:
-
Knowledge Editing
tl;dr: Intro to knowledge editing; Problems with full finetuning; Knowledge triplet updates; Conditions for reliable, generalized, localized edits; KE method; GRACE cache-based updates; ROME mid-layer memory editing; Evaluation on factual correction and retention
[slides]
-
Responsible LLMs & Conclusion
tl;dr: Responsible LLMs – Explainability, fairness, robustness, safety; Bias in LLMs – visibility, sources, impact; Bias mitigation via adversarial triggers and in-context learning; Final course summary; Future directions; NLP applications
[slides]
Supplementary Material: