File size: 5,900 Bytes

---
library_name: transformers
tags: [text2sql, sql-generation, t5, natural-language-processing]
---

# Model Card for ThotaBhanu/t5_sql_askdb

## Model Details

### Model Description

This model is a **T5-based Natural Language to SQL** converter, fine-tuned on the **WikiSQL dataset**. It is designed to convert **English natural language queries** into **SQL queries** that can be executed on relational databases.

- **Developed by:** Bhanu Prasad Thota  
- **Shared by:** Bhanu Prasad Thota  
- **Model type:** T5-based Sequence-to-Sequence Model  
- **Language(s):** English  
- **License:** MIT  
- **Finetuned from model:** `t5-large`  

This model is particularly useful for **text-to-SQL applications**, allowing users to **query databases using plain English** instead of writing SQL.

---

## Model Sources

- **Repository:** [https://huggingface.co/ThotaBhanu/t5_sql_askdb](https://huggingface.co/ThotaBhanu/t5_sql_askdb)  
- **Paper [optional]:** N/A  
- **Demo [optional]:** Coming soon  

---

## Uses

### Direct Use

- Convert **natural language questions** into **SQL queries**  
- Assist in **database query automation**  
- Can be used in **chatbots, data analytics tools, and enterprise database search systems**  

### Downstream Use

- Can be **fine-tuned** further on **custom datasets** to improve domain-specific SQL generation  
- Can be integrated into **business intelligence tools** for better user interaction  

### Out-of-Scope Use

- The model does **not infer database schema** automatically  
- May generate incorrect SQL for **complex nested queries or multi-table joins**  
- Not suitable for **non-relational (NoSQL) databases**  

---

## Bias, Risks, and Limitations

- The model may not **always generate valid SQL** for **custom database schemas**  
- Assumes **consistent column naming**, which may not always be the case in enterprise databases  
- Performance depends on **how well the input query aligns** with the training data format  

### Recommendations

- Always **validate generated SQL** before executing on a live database  
- Use **schema-aware** validation methods for production environments  
- Consider **fine-tuning the model** on domain-specific SQL queries  

---

## How to Get Started with the Model

Use the code below to generate SQL queries from natural language:

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load model and tokenizer
model_name = "ThotaBhanu/t5_sql_askdb"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Function to convert query to SQL
def generate_sql(query):
    input_text = f"Convert to SQL: {query}"
    inputs = tokenizer(input_text, return_tensors="pt")
    output = model.generate(**inputs)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
query = "Find all employees who joined in 2020"
sql_query = generate_sql(query)

print(f"📝 Query: {query}")
print(f"🛠 Generated SQL: {sql_query}")


## Training Details

### Training Data

Dataset: WikiSQL
Size: 80,654 pairs of natural language questions and SQL queries
Preprocessing: Tokenization using T5Tokenizer, max length 128


### Training Procedure

Training framework: Hugging Face Transformers + PyTorch
Hardware used: NVIDIA V100 GPU
Optimizer: AdamW
Learning rate: 5e-5
Batch size: 8
Epochs: 5

#### Training Hyperparameters

Training precision: Mixed precision (fp16)
Gradient accumulation: Yes (to handle large batch sizes)

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary



## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation [optional]

@misc{t5_sql_askdb,
  author = {Bhanu Prasad Thota},
  title = {T5-SQL AskDB Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ThotaBhanu/t5_sql_askdb}}
}


**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]