Abhilash Nandy

abhi1nandy2

AI & ML interests

Natural Language Processing, Computer Vision

Recent Activity

posted an update about 20 hours ago
PAPER - REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Huggingface Paper Link - https://huggingface.co/papers/2505.06548 AUTHORS - Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal ABSTRACT - Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.
View all activity

Organizations

None yet

Posts 2

view post
Post
381
PAPER - REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

Huggingface Paper Link - REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback (2505.06548)

AUTHORS - Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal

ABSTRACT -

Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.
view post
Post
1570
Little late to the party 🥳, but I’m thrilled to finally share our AAAI 2025–accepted work! Check out the project homepage here: https://midas-pro-mds.github.io/

🚀 What’s MiDAS-PRo all about?
We tackle the challenge of coherent, non-redundant multi-document summarization with source attribution via a three-stage LLM-based pipeline:

1. Plan a hierarchical document organization


2. Reason by generating entities/topics


3. Summarize the collection into a cohesive narrative
All “planning” and “reasoning” steps are framed as code-completion tasks, guided by graph attention network-based in-context example selection—boosting both automated and human evaluation scores!



đź”— Resources

- Paper: https://ojs.aaai.org/index.php/AAAI/article/view/34676

- Slides: https://drive.google.com/file/d/1lWqQtHRnpn-g2IQ3guloj_2j8sDm4z0V/view?usp=drivesdk

- Poster: https://drive.google.com/file/d/1EQqgwbcS7xkVx38y0qPvdRh0gQyeOH5u/view?usp=drivesdk

- Video: https://youtube.com/shorts/6ecxLLUpWJE?si=BAluAeP4-_eCmfu7


Big thanks to my mentor and co-author Sambaran Bandyopadhyay at Adobe Research for the guidance (Summer 2024 internship days FTW!) 🙏.

#AAAI2025 #MultiDocumentSummarization #LLM #Research #NLP

datasets 0

None public yet