File size: 3,353 Bytes
4096757 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# π Code Explanation: Text Language Detector
This document explains the **Text Language Detector** app, detailing each part of the provided code and intended use cases.
---
## π Overview
**Purpose**
Detect the language of any input text and return the full language name, ISO code, and confidence score.
**Tech Stack**
- **Model**: `papluca/xlm-roberta-base-language-detection` (Hugging Face Transformers)
- **Model Precision**: `torch_dtype=torch.bfloat16` for reduced memory usage
- **Language Mapping**: `pycountry` to convert ISO codes to full language names
- **Interface**: Gradio Blocks + Buttons
---
## βοΈ Setup & Dependencies
Install required libraries:
```bash
pip install transformers gradio torch pycountry
```
---
## π Detailed Block-by-Block Code Explanation
```python
import torch
import gradio as gr
from transformers import pipeline
import pycountry
# Load the language-detection pipeline with bfloat16 precision
language_detector = pipeline(
"text-classification",
model="papluca/xlm-roberta-base-language-detection",
torch_dtype=torch.bfloat16
)
def detect_language(text: str) -> str:
result = language_detector(text)[0]
code = result["label"] # e.g. "en", "ta", "fr"
score = result["score"]
# Map ISO code to full language name using pycountry
try:
lang = pycountry.languages.get(alpha_2=code).name
except:
lang = code.upper()
return f"{lang} ({code}) β {score:.2f}"
# Build the Gradio interface
with gr.Blocks(theme=gr.themes.Default()) as demo:
gr.Markdown("## π Text Language Detector")
gr.Markdown("Type or paste text below to detect its language (name + code + confidence).")
with gr.Row():
text_input = gr.Textbox(label="π Input Text", placeholder="Type or paste text here...", lines=4)
lang_output = gr.Textbox(label="β
Detected Language", placeholder="Language & confidence", lines=1, interactive=False)
detect_btn = gr.Button("π Detect Language")
detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output)
gr.Markdown("---")
gr.Markdown("Built with π€ Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and π Gradio")
demo.launch()
```
---
## π Core Concepts
| Concept | Why It Matters |
|---------------------------|-------------------------------------------------------|
| Hugging Face Pipeline | One-line model loading & inference |
| bfloat16 Precision | Lower memory usage, faster inference on supported HW |
| pycountry Mapping | Converts ISO codes to human-readable language names |
| Gradio Blocks | Builds interactive web apps with pure Python |
---
## π Intended Uses & Limitations
You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:
- Arabic (ar)
- Bulgarian (bg)
- German (de)
- Modern Greek (el)
- English (en)
- Spanish (es)
- French (fr)
- Hindi (hi)
- Italian (it)
- Japanese (ja)
- Dutch (nl)
- Polish (pl)
- Portuguese (pt)
- Russian (ru)
- Swahili (sw)
- Thai (th)
- Turkish (tr)
- Urdu (ur)
- Vietnamese (vi)
- Chinese (zh)
---
|