File size: 1,752 Bytes
2d6b76d
 
 
 
fd10415
 
 
 
 
2d73fac
 
fd10415
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-Coder-7B-Instruct
---

## Overview
Arctic-Text2SQL-R1-7B is a 7-billion-parameter Text-to-SQL model fine-tuned using Group Relative Policy Optimization (GRPO) with a simple execution-based reward signal. It converts natural language questions into executable SQL queries.

Read more in our paper: [Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL](https://arxiv.org/abs/2505.20315).

## Key Features

- **Lightweight RL formulation**: Uses only execution correctness and syntax validity as rewards.
- **State-of-the-art performance**: Achieves 68.9% execution accuracy on BIRD-dev and 68.5% on BIRD-test, with an average of 57.2% across six benchmarks (BIRD, Spider, Spider2.0, Spider-DK, EHRSQL, ScienceBenchmark)
- **Efficiency**: Outperforms many 70B+ models with only 7B parameters.

## Intended Use

This model is designed for:

- Interactive natural language interfaces to relational databases.
- Data analytics tools enabling non-technical users to query databases.

### Not intended for:
- Generation of non-SQL text or free-form natural language tasks.
- Production systems without validation, especially in safety-critical domains.

| Benchmark        | Dev/Test Accuracy |
| ---------------- | ----------------- |
| BIRD-dev         | 68.9%             |
| BIRD-test        | 68.5%             |
| Spider-test      | 88.8%             |
| Spider2.0-DK     | 15.6%             |
| EHRSQL           | 36.7%             |
| ScienceBenchmark | 51.8%             |
| **Average**      | **57.2%**         |


## Ethical Considerations
- Avoid using for private or sensitive data without proper oversight.
- Validate generated SQL to prevent data leakage or unauthorized access.