McGill NLP Group

university

https://mcgill-nlp.github.io/

McGill_NLP

McGill-NLP

Activity Feed

AI & ML interests

computational linguistics, natural language processing

Recent Activity

Jinny1208 updated a dataset about 2 hours ago

McGill-NLP/speech-translation-and-summarization

ilboglions updated a collection 2 days ago

LACUNA

ilboglions updated a collection 2 days ago

LACUNA

View all activity

Papers

Forecasting Downstream Performance of LLMs With Proxy Metrics

Structured Distillation of Web Agent Capabilities Enables Generalization

View all Papers

McGill-NLP 's collections 22

LACUNA

McGill-NLP/LACUNA-OLMo2-1B-seed42

Updated 2 days ago
McGill-NLP/LACUNA-OLMo3-7B-seed42

Updated 2 days ago
McGill-NLP/LACUNA-data-OLMo2-1B-seed42

Viewer • Updated 2 days ago • 12.6k • 83
McGill-NLP/LACUNA-data-OLMo3-7B-seed42

Viewer • Updated 2 days ago • 89.3k • 84

AfriqueLLM

Best open African LLM

AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages

Paper • 2601.06395 • Published Jan 10 • 5
McGill-NLP/AfriqueQwen-14B

Text Generation • 15B • Updated 13 days ago • 2.32k • • 4
McGill-NLP/AfriqueQwen-8B

Text Generation • 8B • Updated 13 days ago • 1.58k • • 2
McGill-NLP/AfriqueQwen3.5-4B-50Langs

Text Generation • 5B • Updated 13 days ago • 399 • 6

LLM2Vec-Gen

Generative Embeddings from Large Language Models

McGill-NLP/llm2vec-gen-tulu

Viewer • Updated Mar 3 • 10.5M • 793 • 1
McGill-NLP/llm2vec-gen-tulu-w-hard-negative

Viewer • Updated Mar 2 • 3.22M • 292
McGill-NLP/llm2vec-gen-echo-rewritten-w-hard-negative

Viewer • Updated Mar 8 • 7.17M • 78
McGill-NLP/LLM2Vec-Gen-Llama32-1B

Sentence Similarity • Updated Apr 4 • 10 • 2

CRAG-MM-Diagnostics

McGill-NLP/crag-mm-diagnostics

Viewer • Updated 7 days ago • 1.15k • 51
McGill-NLP/crag-mm-qwen3_vl_embedding_2b_image-index

Updated Feb 16 • 7
McGill-NLP/crag-mm-vlm2vecv2.0_image-index

Updated Feb 16 • 7

INJONGO

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

McGill-NLP/AfroXLMR-large-76L-Injongo-intent

Text Classification • 0.6B • Updated May 25, 2025 • 3
McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Token Classification • 0.6B • Updated May 25, 2025 • 4
McGill-NLP/gemma-2-9b-it-Injongo-intent

Text Generation • 9B • Updated May 26, 2025 • 3
McGill-NLP/gemma-2-9b-it-Injongo-slot

Text Generation • 9B • Updated May 26, 2025 • 2

Unequal unlearning

Datasets used for the OLMo experiments in the "Not All Data are Unlearned Equally" paper https://arxiv.org/abs/2504.05058

McGill-NLP/country_capital_qa

Viewer • Updated Apr 16, 2025 • 1.39k • 25
McGill-NLP/book_author_qa

Viewer • Updated Apr 16, 2025 • 1.48k • 15
McGill-NLP/zsre_qa

Viewer • Updated Apr 16, 2025 • 1.52k • 55

Malicious-IR

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12, 2025 • 520 • 11 • 4
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
McGill-NLP/AdvBench-IR-Small-Wiki-100

Viewer • Updated May 29, 2025 • 50.9k • 13

CHASE

Generate challenging synthetic data to evaluate LLMs

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21, 2025 • 671 • 45
McGill-NLP/CHASE-Code

Viewer • Updated Feb 21, 2025 • 500 • 46
McGill-NLP/CHASE-Math

Viewer • Updated Feb 21, 2025 • 500 • 33
How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

WebLINX

https://mcgill-nlp.github.io/weblinx

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39
McGill-NLP/WebLINX

Viewer • Updated Dec 7, 2024 • 79.8k • 2.76k • 65
McGill-NLP/WebLINX-full

Updated Sep 21, 2025 • 22k • 8
McGill-NLP/weblinx-browsergym

Updated Dec 7, 2024 • 28.1k • 4

WebLINX Models

https://mcgill-nlp.github.io/weblinx

McGill-NLP/Llama-3-8B-Web

Text Generation • 8B • Updated Apr 26, 2024 • 336 • 215
McGill-NLP/MiniLM-L6-dmr

Sentence Similarity • Updated Feb 9, 2024 • 7 • 5
McGill-NLP/bge-small-dmr

Sentence Similarity • Updated Feb 9, 2024 • 5 • 1
McGill-NLP/gte-base-dmr

Sentence Similarity • Updated Feb 9, 2024 • 12 • 2

FaithDial

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Paper • 2204.10757 • Published Apr 22, 2022 • 2
McGill-NLP/FaithDial

Viewer • Updated Feb 5, 2023 • 32.3k • 200 • 18
McGill-NLP/roberta-large-faithcritic

Text Classification • Updated Jul 31, 2022 • 49 • 1

Tiered Language Models

McGill-NLP/TLM-180M

0.2B • Updated 15 days ago • 41
McGill-NLP/TLM-650M

0.6B • Updated 15 days ago • 14

A3: Agent-as-Annotators

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776)

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 23
McGill-NLP/A3-Qwen3.5-9B

Image-Text-to-Text • 9B • Updated Apr 16 • 283 • 6
McGill-NLP/A3-Qwen3.5-4B

Image-Text-to-Text • 5B • Updated Apr 16 • 91 • 2
McGill-NLP/A3-Qwen3.5-2B

Image-Text-to-Text • 3B • Updated Apr 16 • 31 • 2

LatentLens Contextual Embeddings

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens

McGill-NLP/contextual_embeddings-llama3.1-8b

Updated Feb 19
McGill-NLP/contextual_embeddings-gemma2-9b

Updated Feb 19
McGill-NLP/contextual_embeddings-qwen2.5-7b

Updated Feb 19
McGill-NLP/latentlens-qwen2vl-embeddings

Updated Feb 7

The Markovian Thinker

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

McGill-NLP/delethink-24k-1.5b

2B • Updated Oct 9, 2025 • 93 • 5
McGill-NLP/longcot-24k-1.5b

2B • Updated Oct 9, 2025 • 5 • 2
McGill-NLP/longcot-8k-1.5b

2B • Updated Oct 9, 2025 • 6 • 1
McGill-NLP/delethink-96k-base-1.5b

2B • Updated Oct 3, 2025 • 5 • 1

SSA-COMET

McGill-NLP/ssa-comet-qe

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-mtl

Translation • Updated Mar 11 • 3
McGill-NLP/ssa-comet-stl

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-qe-final

Translation • Updated Apr 27

AgentRewardBench

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 29
McGill-NLP/agent-reward-bench

Viewer • Updated Apr 21, 2025 • 1.41k • 7.35k • 4
Running

Agents

5

Agent Reward Bench Demo

💻

5

Explore agent trajectories and judgments in web benchmarks
Running

Agents

3

Agent Reward Bench Leaderboard

🥇

3

Leaderboard for AgentRewardBench

SafeArena

McGill-NLP/safearena

Updated Apr 23, 2025 • 152 • 5
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
Running

Agents

3

Safearena Leaderboard

🏃

3

SafeArena Leaderboard

LLM2Vec

McGill-NLP/LLM2Vec-Meta-Llama-32-3B-Instruct-mntp-supervised

Updated Nov 15, 2025
McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Oct 8, 2024 • 317 • 5
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Apr 30, 2024 • 112k • 52
McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

Sentence Similarity • Updated Apr 11, 2024 • 270 • 13

AURORA

Repository: https://github.com/McGill-NLP/AURORA

McGill-NLP/AURORA

Viewer • Updated Jul 25, 2024 • 169k • 215 • 7
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 1 • 4
McGill-NLP/aurora-bench

Viewer • Updated Jul 9, 2024 • 400 • 11 • 2
Runtime error

Agents

5

AURORA

🌖

5

Statcan Dialogue Dataset & Models

mcgill-nlp.github.io/statcan-dialogue-dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Paper • 2304.01412 • Published Apr 3, 2023 • 2
McGill-NLP/statcan-dialogue-dataset

Preview • Updated May 24, 2024 • 4 • 7
McGill-NLP/dpr-statcan-conversation_encoder-title

Feature Extraction • 0.1B • Updated Jul 17, 2023 • 7
McGill-NLP/tapas-statcan-large-conversation_encoder-cell_tokens

Feature Extraction • Updated Apr 5, 2023 • 4

MLQuestions

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Paper • 2104.08801 • Published Apr 18, 2021 • 1
McGill-NLP/mlquestions

Updated Nov 11, 2021 • 194 • 3
McGill-NLP/bart-qg-mlquestions-backtraining

Updated Apr 8, 2022 • 8
McGill-NLP/bart-qg-mlquestions-selftraining

Updated Apr 12, 2022 • 5

LACUNA

McGill-NLP/LACUNA-OLMo2-1B-seed42

Updated 2 days ago
McGill-NLP/LACUNA-OLMo3-7B-seed42

Updated 2 days ago
McGill-NLP/LACUNA-data-OLMo2-1B-seed42

Viewer • Updated 2 days ago • 12.6k • 83
McGill-NLP/LACUNA-data-OLMo3-7B-seed42

Viewer • Updated 2 days ago • 89.3k • 84

Tiered Language Models

McGill-NLP/TLM-180M

0.2B • Updated 15 days ago • 41
McGill-NLP/TLM-650M

0.6B • Updated 15 days ago • 14

AfriqueLLM

Best open African LLM

AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages

Paper • 2601.06395 • Published Jan 10 • 5
McGill-NLP/AfriqueQwen-14B

Text Generation • 15B • Updated 13 days ago • 2.32k • • 4
McGill-NLP/AfriqueQwen-8B

Text Generation • 8B • Updated 13 days ago • 1.58k • • 2
McGill-NLP/AfriqueQwen3.5-4B-50Langs

Text Generation • 5B • Updated 13 days ago • 399 • 6

A3: Agent-as-Annotators

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776)

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 23
McGill-NLP/A3-Qwen3.5-9B

Image-Text-to-Text • 9B • Updated Apr 16 • 283 • 6
McGill-NLP/A3-Qwen3.5-4B

Image-Text-to-Text • 5B • Updated Apr 16 • 91 • 2
McGill-NLP/A3-Qwen3.5-2B

Image-Text-to-Text • 3B • Updated Apr 16 • 31 • 2

LLM2Vec-Gen

Generative Embeddings from Large Language Models

McGill-NLP/llm2vec-gen-tulu

Viewer • Updated Mar 3 • 10.5M • 793 • 1
McGill-NLP/llm2vec-gen-tulu-w-hard-negative

Viewer • Updated Mar 2 • 3.22M • 292
McGill-NLP/llm2vec-gen-echo-rewritten-w-hard-negative

Viewer • Updated Mar 8 • 7.17M • 78
McGill-NLP/LLM2Vec-Gen-Llama32-1B

Sentence Similarity • Updated Apr 4 • 10 • 2

LatentLens Contextual Embeddings

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens

McGill-NLP/contextual_embeddings-llama3.1-8b

Updated Feb 19
McGill-NLP/contextual_embeddings-gemma2-9b

Updated Feb 19
McGill-NLP/contextual_embeddings-qwen2.5-7b

Updated Feb 19
McGill-NLP/latentlens-qwen2vl-embeddings

Updated Feb 7

CRAG-MM-Diagnostics

McGill-NLP/crag-mm-diagnostics

Viewer • Updated 7 days ago • 1.15k • 51
McGill-NLP/crag-mm-qwen3_vl_embedding_2b_image-index

Updated Feb 16 • 7
McGill-NLP/crag-mm-vlm2vecv2.0_image-index

Updated Feb 16 • 7

The Markovian Thinker

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

McGill-NLP/delethink-24k-1.5b

2B • Updated Oct 9, 2025 • 93 • 5
McGill-NLP/longcot-24k-1.5b

2B • Updated Oct 9, 2025 • 5 • 2
McGill-NLP/longcot-8k-1.5b

2B • Updated Oct 9, 2025 • 6 • 1
McGill-NLP/delethink-96k-base-1.5b

2B • Updated Oct 3, 2025 • 5 • 1

INJONGO

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

McGill-NLP/AfroXLMR-large-76L-Injongo-intent

Text Classification • 0.6B • Updated May 25, 2025 • 3
McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Token Classification • 0.6B • Updated May 25, 2025 • 4
McGill-NLP/gemma-2-9b-it-Injongo-intent

Text Generation • 9B • Updated May 26, 2025 • 3
McGill-NLP/gemma-2-9b-it-Injongo-slot

Text Generation • 9B • Updated May 26, 2025 • 2

SSA-COMET

McGill-NLP/ssa-comet-qe

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-mtl

Translation • Updated Mar 11 • 3
McGill-NLP/ssa-comet-stl

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-qe-final

Translation • Updated Apr 27

Unequal unlearning

Datasets used for the OLMo experiments in the "Not All Data are Unlearned Equally" paper https://arxiv.org/abs/2504.05058

McGill-NLP/country_capital_qa

Viewer • Updated Apr 16, 2025 • 1.39k • 25
McGill-NLP/book_author_qa

Viewer • Updated Apr 16, 2025 • 1.48k • 15
McGill-NLP/zsre_qa

Viewer • Updated Apr 16, 2025 • 1.52k • 55

AgentRewardBench

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 29
McGill-NLP/agent-reward-bench

Viewer • Updated Apr 21, 2025 • 1.41k • 7.35k • 4
Running

Agents

5

Agent Reward Bench Demo

💻

5

Explore agent trajectories and judgments in web benchmarks
Running

Agents

3

Agent Reward Bench Leaderboard

🥇

3

Leaderboard for AgentRewardBench

Malicious-IR

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12, 2025 • 520 • 11 • 4
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
McGill-NLP/AdvBench-IR-Small-Wiki-100

Viewer • Updated May 29, 2025 • 50.9k • 13

SafeArena

McGill-NLP/safearena

Updated Apr 23, 2025 • 152 • 5
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
Running

Agents

3

Safearena Leaderboard

🏃

3

SafeArena Leaderboard

CHASE

Generate challenging synthetic data to evaluate LLMs

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21, 2025 • 671 • 45
McGill-NLP/CHASE-Code

Viewer • Updated Feb 21, 2025 • 500 • 46
McGill-NLP/CHASE-Math

Viewer • Updated Feb 21, 2025 • 500 • 33
How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

LLM2Vec

McGill-NLP/LLM2Vec-Meta-Llama-32-3B-Instruct-mntp-supervised

Updated Nov 15, 2025
McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Oct 8, 2024 • 317 • 5
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Apr 30, 2024 • 112k • 52
McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

Sentence Similarity • Updated Apr 11, 2024 • 270 • 13

WebLINX

https://mcgill-nlp.github.io/weblinx

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39
McGill-NLP/WebLINX

Viewer • Updated Dec 7, 2024 • 79.8k • 2.76k • 65
McGill-NLP/WebLINX-full

Updated Sep 21, 2025 • 22k • 8
McGill-NLP/weblinx-browsergym

Updated Dec 7, 2024 • 28.1k • 4

AURORA

Repository: https://github.com/McGill-NLP/AURORA

McGill-NLP/AURORA

Viewer • Updated Jul 25, 2024 • 169k • 215 • 7
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 1 • 4
McGill-NLP/aurora-bench

Viewer • Updated Jul 9, 2024 • 400 • 11 • 2
Runtime error

Agents

5

AURORA

🌖

5

WebLINX Models

https://mcgill-nlp.github.io/weblinx

McGill-NLP/Llama-3-8B-Web

Text Generation • 8B • Updated Apr 26, 2024 • 336 • 215
McGill-NLP/MiniLM-L6-dmr

Sentence Similarity • Updated Feb 9, 2024 • 7 • 5
McGill-NLP/bge-small-dmr

Sentence Similarity • Updated Feb 9, 2024 • 5 • 1
McGill-NLP/gte-base-dmr

Sentence Similarity • Updated Feb 9, 2024 • 12 • 2

Statcan Dialogue Dataset & Models

mcgill-nlp.github.io/statcan-dialogue-dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Paper • 2304.01412 • Published Apr 3, 2023 • 2
McGill-NLP/statcan-dialogue-dataset

Preview • Updated May 24, 2024 • 4 • 7
McGill-NLP/dpr-statcan-conversation_encoder-title

Feature Extraction • 0.1B • Updated Jul 17, 2023 • 7
McGill-NLP/tapas-statcan-large-conversation_encoder-cell_tokens

Feature Extraction • Updated Apr 5, 2023 • 4

FaithDial

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Paper • 2204.10757 • Published Apr 22, 2022 • 2
McGill-NLP/FaithDial

Viewer • Updated Feb 5, 2023 • 32.3k • 200 • 18
McGill-NLP/roberta-large-faithcritic

Text Classification • Updated Jul 31, 2022 • 49 • 1

MLQuestions

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Paper • 2104.08801 • Published Apr 18, 2021 • 1
McGill-NLP/mlquestions

Updated Nov 11, 2021 • 194 • 3
McGill-NLP/bart-qg-mlquestions-backtraining

Updated Apr 8, 2022 • 8
McGill-NLP/bart-qg-mlquestions-selftraining

Updated Apr 12, 2022 • 5

AI & ML interests

Recent Activity

Papers

Team members 52

McGill-NLP 's collections 22

Agent Reward Bench Demo

Agent Reward Bench Leaderboard

Safearena Leaderboard

AURORA

Agent Reward Bench Demo

Agent Reward Bench Leaderboard

Safearena Leaderboard