Lectures
You can download the lectures here. š Subscribe to our newsletter for the latest updates on LLMs!
-
6.2. Introduction to Transformer: Positional Encoding and Layer Normalization
Lecture date: August 7, 2024 (Wednesday)
tl;dr: Discussion on various types on positional encoding methods (Absolute Positional Encoding, Relative Positional Encoding, Rotary Positional Encoding). Understanding Layer Normalization.
[slides] [scribe] [video]
-
11. Scaling Laws
Lecture date: August 31, 2024 (Saturday)
tl;dr: Discussion on emergent abilities of LLMs. Understanding the empirical scaling laws for neural language model performance on the cross-entropy loss – Kaplan laws, Chinchilla scaling laws. Discussion on an alternative perspective on emergent abilities – are these abilities really 'emergent'?
[slides] [scribe] [video]
-
12.1. Pre-training of Causal LMs and In-context Learning
Lecture date: September 2, 2024 (Monday)
tl;dr: Looking into the procedure for pre-training of causal/auto-regressive language models. Discussion on the in-context learning ability of LLMs.
[slides] [scribe] [video]
Suggested Readings:
- Improving Language Understanding by Generative Pre-Training
- Language Models are Unsupervised Multitask Learners
- Language Models are Few-Shot Learners
- Learning To Retrieve Prompts for In-Context Learning
- Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
-
12.2. Instruction Tuning
Lecture date: September 4, 2024 (Wednesday)
tl;dr: Discussion on instruction tuning of LLMs – data collection, loss function, and properties of tuned models.
[slides] [scribe] [video]
Suggested Readings:
- Instruction Tuning for Large Language Models: A Survey
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions
- WizardLM: Empowering Large Language Models to Follow Complex Instructions
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
-
13.1. Alignment of Language Models: Reward Maximization-I
Lecture date: September 5, 2024 (Thursday)
tl;dr: Looking into the reward model for alignment – modeling the alignment procedure as reinforcement learning, the architecture of reward model, training the reward model, gathering preference data (RLHF vs RLAIF), reward maximization objective.
[slides] [scribe] [video]
-
14.2. Quantization, Pruning & Distillation
Lecture date: September 23, 2024 (Monday)
tl;dr: Discussion on various model compression techniques – post-training quantization, QLoRA, magnitude and structured pruning, knowledge distillation.
[slides] [video]
Suggested Readings:
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
- QLoRA: Efficient Finetuning of Quantized LLMs
- Structured Pruning Learns Compact and Accurate Models
- A Simple and Effective Pruning Approach for Large Language Models
- Distilling the Knowledge in a Neural Network
- Sequence-Level Knowledge Distillation
-
15.2. Efficient LLM Decoding-II
Lecture date: September 26, 2024 (Thursday)
tl;dr: Discussion on various efficient decoding techniques – flash decoding, speculative decoding, Medusa and tree attention, prompt-lookup decoding, lookahead decoding.
[slides] [scribe] [video]
Suggested Readings:
- Flash-Decoding for long-context inference
- Fast Inference from Transformers via Speculative Decoding
- Accelerating Large Language Model Decoding with Speculative Sampling
- MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
- Break the Sequential Dependency of LLM Inference Using LOOKAHEAD DECODING
-
16.1. Retrieval-based Language Models-I
Lecture date: September 30, 2024 (Monday)
tl;dr: Discussion on the motivation behind retrieval-based LMs and various retrieval methods – sparse and dense retrieval, cross-encoder reranking, differentiable search index, table-of-contents aware search.
[slides] [video]
Suggested Readings:
- Chapter-6, Introduction to Information Retrieval
- Reading Wikipedia to Answer Open-Domain Questions
- Dense Passage Retrieval for Open-Domain Question Answering
- Unsupervised Dense Information Retrieval with Contrastive Learning
- Passage Re-ranking with BERT
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
- Precise Zero-Shot Dense Retrieval without Relevance Labels
- Transformer Memory as a Differentiable Search Index
-
16.2. Retrieval-based Language Models-II
Lecture date: October 3, 2024 (Thursday)
tl;dr: Looking into various retrieval-based LMs – kNN LM, RETRO, REALM, RAG. Discussion on different training methods for retrieval-augmented LMs and their limitations.
[slides] [scribe] [video]
Suggested Readings:
- Generalization through Memorization: Nearest Neighbor Language Models
- Improving language models by retrieving from trillions of tokens
- REALM: Retrieval-Augmented Language Model Pre-Training
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- In-Context Retrieval-Augmented Language Models
- REPLUG: Retrieval-Augmented Black-Box Language Models
-
17.1. Multimodal Models-I
Lecture date: October 7, 2024 (Monday)
tl;dr: Understanding the architecture and pre-training strategies of multimodal models – the focus of this lecture is on multimodal understanding involving two modalities (image and text).
[slides] [video]
Suggested Readings:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Learning Transferable Visual Models From Natural Language Supervision
- LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
- VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
- IMAGEBIND: One Embedding Space To Bind Them All
-
17.2. Multimodal Models-II
Lecture date: October 14, 2024 (Monday)
tl;dr: Discussion on text generation with multimodal inputs.
[slides] [video]
Suggested Readings:
- Multimodal Few-Shot Learning with Frozen Language Models
- Flamingo: a Visual Language Model for Few-Shot Learning
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connection
- Visual Instruction Tuning
- Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
-
18.2. LLMs and Tools: Function Calling
Lecture date: October 17, 2024 (Thursday)
tl;dr: Discussion on how we can teach LLMs to use APIs and call appropriate functions when required.
[slides] [scribe] [video]
Suggested Readings:
- Gorilla: Large Language Model Connected with Massive APIs
- ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
- APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
- Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
-
18.3. LLMs and Tools: Agentic Workflow
Lecture date: October 19, 2024 (Saturday)
tl;dr: Discussion on how we can automate complex, multi-step tasks – developing LLM-based agents.
[slides] [scribe] [video]
Suggested Readings:
- ReAct: Synergizing Reasoning and Acting in Language Models
- Self-Refine: Iterative Refinement with Self-Feedback
- Reflexion: Language Agents with Verbal Reinforcement Learning
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
-
19. Reasoning in LLMs
Lecture date: October 21, 2024 (Monday)
tl;dr: Looking into different types of reasoning tasks and various techniques (like, Chain-of-Thought prompting, backward chaining, etc.) facilitating LLMs to solve these tasks. Overview of various reasoning benchmarks and discussion on whether LLMs can truly reason and plan, highlighting both current capabilities and limitations.
[slides] [video]
Suggested Readings:
- Natural Language Reasoning, A Survey
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
- The Reversal Curse: LLMs trained on āA is Bā fail to learn āB is Aā
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
- Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
- On the Planning Abilities of Large Language Models: A Critical Investigation
- Can Large Language Models Reason and Plan?
- LLMs Canāt Plan, But Can Help Planning in LLM-Modulo Frameworks
-
22. Self-evolving LLMs
Lecture date: November 7, 2024 (Thursday)
tl;dr: Discussion on self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences generated by the model itself.
[slides] [scribe] [video]
Suggested Readings:
- A Survey on Self-Evolution of Large Language Models
- STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning
- Self-Instruct: Aligning Language Models with Self-Generated Instructions
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- Self-Refine: Iterative Refinement with Self-Feedback
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
-
24. Interpretability: Demystifying the Black-Box LMs
Lecture date: November 13, 2024 (Wednesday)
tl;dr: Discussion on various interpretability techniques to decipher the inner workings of LLMs.
[slides] [scribe]
Suggested Readings:
- Probing Classifiers: Promises, Shortcomings, and Advances
- Mechanistic?
- Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small
- Towards Automated Circuit Discovery for Mechanistic Interpretability
- Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
- Attribution Patching: Activation Patching At Industrial Scale
- In-context Learning and Induction Heads
- How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning
- Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning