Project Team Formation and Project Selection

Kindly fill out the form (Right click to open in new tab if the form does not open) before 30/01/2025 17:00 hours.

Note: Only one member per team should fill out the form. You need to fill in the top-5 preferences.

Note: If you want to do a project based on a self-proposed problem statement, please fill out this form and discuss it with Prof. Tanmoy after the class or meet him in his office - it is mandatory to get the professor’s approval to pursue a self-proposed project!

List of Projects

Project 1: Developing a System for Table QA

Problem Statement
The aim of this project is to develop a system capable of answering complex, free-form questions based on tabular data. Unlike simple factual QA tasks, this project requires the system to understand the question, retrieve relevant information from the table, reason about the retrieved data, and generate a coherent and contextually accurate answer in natural language.

Task Description
You are tasked with building a Table Question Answering (Table QA) system. The system must process a given table and a corresponding question to produce a free-form answer. The system should be able to:

Comprehend the Question: Analyze and understand the intent of the question.
Retrieve Relevant Information: Identify the necessary rows, columns, or cells in the table to answer the question.
Integrate and Infer: Reason about the retrieved information, integrating multiple pieces of data when required.
Generate an Answer: Produce a fluent and coherent free-form answer in natural language.

Dataset
You will be using the FeTaQA dataset for this project. The provided HuggingFace version already has a train, val, and test split. Do not use the test split for training to ensure an unbiased evaluation. Final rankings will be based on performance on a private test dataset.

Evaluation Metrics
The generated answers will be evaluated using:

Sacre-BLEU (S-BLEU)
ROUGE
BERTScore

Suggestions

Develop a modular system with separate components for different sub-tasks instead of relying solely on a large language model.
Explore techniques to optimize performance while minimizing trainable parameters.
Innovative and efficient approaches will be rewarded during the presentation.

Guidelines

Kaggle Competition: The project will be hosted as a Kaggle competition. Details and submission requirements will be shared soon.
Model Constraints: Use pre-trained language models with a maximum of 8 billion parameters. Avoid publicly available models fine-tuned for table-based tasks.
Plagiarism Policy: Adaptations from published papers are allowed but must be implemented independently, with proper citation. Plagiarism will result in zero marks.

Project 2: Building a Text-to-SQL System

Problem Statement
The goal of this project is to design a system that translates natural language questions into SQL queries executable on a relational database. The system must handle complex queries involving joins, nested subqueries, and aggregation while generalizing to unseen databases.

Task Description
You are required to develop a Text-to-SQL system capable of:

Comprehending Query Intent: Parse natural language questions to determine intent and desired data output.
Understanding Database Schema: Analyze the schema to identify relevant tables, columns, and relationships.
Generating SQL Queries: Produce syntactically and semantically correct SQL queries.
Handling Complexity: Address challenges involving joins, nested subqueries, and aggregation.

Dataset
You will use the Spider dataset. Treat the provided val split as the test split and generate a separate val split from the train data. Final rankings will be based on a private test dataset.

Evaluation Metrics
The system will be evaluated on:

Component Matching
Exact Matching

Suggestions

Create a modular system with components for question parsing, schema understanding, and SQL generation.
Consider using pre-trained language models while addressing their limitations in SQL generation.
Innovative and efficient designs will be rewarded during the presentation.

Guidelines

Kaggle Competition: The project will be hosted on Kaggle, with details and submission requirements shared soon.
Model Constraints: Use pre-trained models with a maximum of 8 billion parameters. Avoid models fine-tuned on text-to-SQL datasets.
Plagiarism Policy: Adapt methods from published papers, but ensure independent implementation and citation. Plagiarism will result in zero marks.

Project 3: DialoCONAN Counterspeech Generation Challenge

Problem Statement
Your task is to develop an advanced counterspeech generation model using the DialoCONAN dataset. The DialoCONAN dataset comprises over 3000 multi-turn fictitious dialogues between a hater and an NGO operator, covering six targets of hate. Your goal is to generate high-quality, contextually appropriate counterspeech responses given a hate speech input and dialogue history.

Dataset
The DialoCONAN dataset containing multi-turn dialogues will be provided. You can split the dataset into train, validation, and test sets for your experiments. The final test data released during the competition will be disjoint from the provided dataset.

Evaluation Metrics
The generated counterspeech will be evaluated on:

BLEU score
ROUGE score
BERTScore

Kaggle Competition Guidelines

All experiments must be conducted within the Kaggle environment.
Participants are allowed to use pre-trained language models with up to 8 billion parameters.

Project 4: IntentCONANv2 Intent-Specific Counterspeech Generation

Problem Statement
Your challenge is to create an intent-specific counterspeech generation model using the IntentCONANv2 dataset. The IntentCONANv2 dataset contains around 13K counterspeeches conditioned on four intents (csType): informative, denouncing, question, and positive. Your objective is to generate high-quality, intent-specific counterspeech responses to given hate speech and csType as inputs.

Dataset
The IntentCONANv2 dataset will be provided, containing hate speech-counterspeech pairs with associated intents. You can split the dataset into train, validation, and test sets for your experiments. The final test data released during the competition will be disjoint from the provided dataset.

Evaluation Metrics
The generated counterspeech will be evaluated on:

BLEU score
ROUGE score
BERTScore

Kaggle Competition Guidelines

All experiments must be conducted within the Kaggle environment.
Participants are allowed to use pre-trained language models with up to 8 billion parameters.

Project 5: Multi-Task Knowledge Distillation Framework for Natural Language Generation

Problem Statement

The aim of this project is to develop a multi-task knowledge distillation system for Natural Language Generation (NLG). Unlike systems designed for a single task, this project requires the system to excel across multiple NLG tasks, such as summarization, question answering, and paraphrase generation. The focus is on distilling the knowledge of a large teacher model (LLaMA-7B) into a smaller, efficient system (≤1.5B parameters) that generalizes well across diverse tasks while maintaining high performance.

Framework Design

Teacher Model

Model: LLaMA-3.1-8B (pre-trained).
Role: Acts as the oracle, generating logits, embeddings, or task-specific outputs for training the student models.

Student System

Constraints: Combined size ≤ 1.5B parameters.
Design Choices:
- Single Multi-Task Model:
  - A unified student model trained for all tasks.
- Task-Specific Models:
  - Separate smaller models specialized for each task.
  - Shared encoder with task-specific decoders.
- Hybrid Approach:
  - A shared backbone (e.g., Llama-3.2-1B, ~1B parameters) with task-specific adapters or lightweight modules.
  - Use techniques like LoRA or prompt tuning.
Additional Guidelines:
- The student models shuold be pre-fine-tuned for any specific task. You can fine tune them using PEFT or FFT
- The student system must intelligently analyze input prompts and determine task-specific processing if using task-specific models.

Tasks and Datasets

Tasks:

Summarization:
- Dataset: CNN/DailyMail (news articles → abstractive summaries).
Question Answering:
- Dataset: SQuAD 2.0 (context + question → answer or “no answer”).
Paraphrase Generation:
- Dataset: Quora Question Pairs (questions → paraphrases).

Dataset Usage:

Use only the train split for training.
The test split will be used for leaderboard evaluation.

Evaluation Metrics

The quality of the generated outputs will be evaluated using the following metrics:

Summarization:
- ROUGE-L.
Question Answering:
- Combination of ROUGE-L and BERTScore.
Paraphrase Generation:
- Combination of Sacre-BLEU and METEOR.
Efficiency:
- Processing time per query (the standard hardware will be announced later).

Final Leaderboard Score: A weighted combination of all the above metrics on the test datasets. Exceeding the 1.5B parameter constraint will result in exponential penalties.

Guidelines

Teacher Model Constraints: Base pre-trained LLaMA-3.1-8B.
Student Model Constraints: Open-source-trained LLMs are not fine-tuned for any specific task.
Plagiarism Policy:
- Methods from published papers may be adapted, but the implementation must be original.
- Submissions will be checked for plagiarism against web resources and team submissions. Any detected cases will result in zero marks for the project component.
Kaggle Competition:
- The project will be hosted as a Kaggle competition.
Experimentation is the king. Try to experiment with as many techniques as you can. You might also need to resort to quantization and PEFT techniques for FT.
- Ensure that your code runs smoothly in the Kaggle environment and generates output files that meet the competition specifications. Submission requirements may be subject to change.

Relevant Papers

Knowledge Distillation

Distilling the Knowledge in a Neural Network
- Authors: Geoffrey Hinton, Oriol Vinyals, Jeff Dean
- Link: https://arxiv.org/abs/1503.02531
A Survey on Knowledge Distillation of Large Language Models
- Link: https://arxiv.org/abs/2402.13116
MiniLLM: Efficient Knowledge Distillation for Large Language Models
- Link: https://arxiv.org/abs/2306.08543

Multi-Task Learning

An Overview of Multi-Task Learning in Deep Neural Networks
- Authors: Sebastian Ruder
- Link: https://arxiv.org/abs/1706.05098

Parameter-Efficient Fine-Tuning

LoRA: Low-Rank Adaptation of Large Language Models
- Link: https://arxiv.org/abs/2106.09685
AdaLoRA: Adaptive Low-Rank Optimization for Efficient Fine-Tuning
- Link: https://arxiv.org/abs/2303.10512
Prefix Tuning: Optimizing Continuous Prompts for Generation
- Link: https://arxiv.org/abs/2101.00190

Project 6: Building a Multi-Model System for Optimized Natural Language Generation

Problem Statement

The goal of this project is to develop a multi-model system that leverages the strengths of different pre-trained models—Qwen2.5-1.5B, OPT-1.3B, and LLaMA-3.2 1B—to optimize performance across multiple tasks in Natural Language Generation (NLG). Unlike traditional single-model systems, this project focuses on combining multiple models in an intelligent and efficient way to balance accuracy, resource usage, and task-specific optimization.

Students are encouraged to design systems that use innovative techniques, including but not limited to:

Dynamic Decision Layers: Decide which model(s) to query based on the input query or task type.
Pipeline Architectures: Use one model’s output as the input to another, creating a chain of processing for improved results.
Ensemble Techniques: Combine predictions from multiple models to produce a superior final output.

The challenge lies in creating an efficient system that achieves high performance across tasks while minimizing redundancy and computational cost.

Tasks and Datasets

The system will be evaluated on the following tasks and datasets:

Summarization:
- Dataset: CNN/DailyMail (news articles → abstractive summaries).
- Task: Generate concise and informative summaries of news articles.
Question Answering:
- Dataset: SQuAD 2.0 (context + question → answer or “no answer”).
- Task: Produce free-form answers based on a given context and question.
Paraphrase Generation:
- Dataset: Quora Question Pairs (questions → paraphrases).
- Task: Generate semantically equivalent paraphrases for input sentences.

You are only allowed to use the train split of these datasets for training purposes. The test split will be used for leaderboard evaluation.

Evaluation Metrics

The quality of the generated outputs will be assessed using the following metrics:

Summarization: ROUGE-L.
Question Answering: Combination of ROUGE-L and BERTScore.
Paraphrase Generation: Combination of Sacre-BLEU and METEOR.
Efficiency: Inference time per query will be measured, and a standard hardware specification will be announced later.

The final leaderboard score will combine all these metrics, evaluated on the test splits of the specified datasets.

Guidelines

Model Constraints:
- You are only allowed to use the following pre-trained language models:
  - Qwen2.5-1.5B
  - OPT-1.3B
  - LLaMA-3.2 1B
- Fine-tuning on the train splits of the specified datasets is allowed.
Prohibited Models:
- Publicly available models explicitly fine-tuned for these tasks are not allowed.
Plagiarism Policy:
- Methods from published papers may be adapted, but the implementation must be original, with proper citations provided.
- Submissions will be checked for plagiarism against web resources and team submissions. Any detected cases of plagiarism will result in zero marks for the project component.
Kaggle Competition:
- The project will be hosted as a Kaggle competition.
- Details regarding the competition, including submission requirements, will be shared soon.
- Ensure that your code runs smoothly in the Kaggle environment and generates output files that meet the competition specifications. Submission requirements may be subject to change.
Experimentation is the king. Try to experiment with as many techniques as you can. You might also need to resort to quantization and PEFT techniques for FT

Relevant Papers

To assist in designing your system, here are some relevant papers that provide insights into multi-model systems, ensemble techniques, and decision layers:

Parameter Efficient Fine Tuning

LoRA: Low-Rank Adaptation of Large Language Models
- Authors: Edward J. Hu, Yelong Shen, Phillip Wallis, et al.
- Link: https://arxiv.org/abs/2106.09685
Prefix-Tuning: Optimizing Continuous Prompts for Generation
- Authors: Xiang Lisa Li, Percy Liang
- Link: https://arxiv.org/abs/2101.00190

Dynamic Decision Layers and Model Routing

Mixture of Experts
- Authors: Noam Shazeer et al.
- Link: https://arxiv.org/abs/1701.06538
AdaBERT: Task-Adaptive BERT Compression with Mixture-of-Adapters
- Authors: Changlan Li et al.
- Link: https://arxiv.org/abs/2005.04861

Ensemble and Modular Techniques

Ensemble Methods in Machine Learning
- Authors: Thomas G. Dietterich
- Link: https://link.springer.com/chapter/10.1007/3-540-45014-9_1
RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Authors: Patrick Lewis et al.
- Link: https://arxiv.org/abs/2005.11401

Deliverables

Code:
- A modular and scalable system integrating multiple models.
Report:
- Description of the system architecture, design decisions, and task-specific performance.
Leaderboard Submission:
- Outputs for the test splits, formatted as per the Kaggle competition requirements.

Project 7: Taxonomy Expansion Using Prompt-Based Reasoning on Graphs

Problem Statement

The goal of this project is to expand an existing taxonomy by accurately finding the parent node for a new concept. Instead of relying on discrimative methods, this project utilizes advanced prompt engineering and large language models (LLMs) to “think on graphs.” The system will integrate semantic and structural reasoning over the taxonomy to identify the most suitable parent node.

Key Objectives

Graph Representation:
- Represent the taxonomy as a structured text-based graph, where nodes and edges are described using natural language.
- Include metadata such as node definitions, hierarchy levels, and relationships.
Prompt-Based Parent Identification:
- Use structured prompts to reason about the graph and identify the best parent node.
- Incorporate contextual information about the graph’s structure and semantics into the prompts.
Interactive Refinement:
- Iteratively refine the parent prediction by querying the model with progressively detailed prompts incorporating feedback and additional context.
- You need to predict the parent node for the query term. For this, you may need to pass some possible parent nodes in the prompt to find the parent from them.

Key Components

Graph Representation

Input Format:
- Convert the taxonomy graph into a textual representation (e.g., JSON, plain text).
- Include:
  - Node names and descriptions.
  - Parent-child relationships.
  - Depth and sibling information (if available).

Example:

{
  "node": "Mammal",
  "description": "Warm-blooded vertebrates with hair or fur.",
  "children": ["Dog", "Cat", "Whale"]
}

Output

The selected parent node for the new concept.

Dataset

SemEval Datasets: Science and Food
WordNet Dataset

Evaluation Metrics

Accuracy
Wu & Palmer Metric

Guidelines

Use advanced LLMs (e.g., GPT-4, LLaMA) for prompt-based reasoning.
Experiment with different prompt templates to improve accuracy and reasoning.
Integrate pre-trained embedding models for semantic similarity computation.
Ensure that the taxonomy remains a valid DAG after expansion.

Relevant Papers and Resources