File size: 3,353 Bytes
4096757
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

# πŸ“– Code Explanation: Text Language Detector

This document explains the **Text Language Detector** app, detailing each part of the provided code and intended use cases.

---

## πŸ“ Overview

**Purpose**  
Detect the language of any input text and return the full language name, ISO code, and confidence score.

**Tech Stack**  
- **Model**: `papluca/xlm-roberta-base-language-detection` (Hugging Face Transformers)  
- **Model Precision**: `torch_dtype=torch.bfloat16` for reduced memory usage  
- **Language Mapping**: `pycountry` to convert ISO codes to full language names  
- **Interface**: Gradio Blocks + Buttons  

---

## βš™οΈ Setup & Dependencies

Install required libraries:

```bash
pip install transformers gradio torch pycountry
```

---

## πŸ” Detailed Block-by-Block Code Explanation

```python
import torch
import gradio as gr
from transformers import pipeline
import pycountry

# Load the language-detection pipeline with bfloat16 precision
language_detector = pipeline(
    "text-classification",
    model="papluca/xlm-roberta-base-language-detection",
    torch_dtype=torch.bfloat16
)

def detect_language(text: str) -> str:
    result = language_detector(text)[0]
    code = result["label"]      # e.g. "en", "ta", "fr"
    score = result["score"]

    # Map ISO code to full language name using pycountry
    try:
        lang = pycountry.languages.get(alpha_2=code).name
    except:
        lang = code.upper()

    return f"{lang} ({code}) β€” {score:.2f}"

# Build the Gradio interface
with gr.Blocks(theme=gr.themes.Default()) as demo:
    gr.Markdown("## 🌐 Text Language Detector")
    gr.Markdown("Type or paste text below to detect its language (name + code + confidence).")

    with gr.Row():
        text_input = gr.Textbox(label="πŸ“ Input Text", placeholder="Type or paste text here...", lines=4)
        lang_output = gr.Textbox(label="βœ… Detected Language", placeholder="Language & confidence", lines=1, interactive=False)

    detect_btn = gr.Button("πŸ” Detect Language")
    detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output)

    gr.Markdown("---")
    gr.Markdown("Built with πŸ€— Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and πŸš€ Gradio")

demo.launch()
```

---

## πŸš€ Core Concepts

| Concept                   | Why It Matters                                        |
|---------------------------|-------------------------------------------------------|
| Hugging Face Pipeline     | One-line model loading & inference                    |
| bfloat16 Precision        | Lower memory usage, faster inference on supported HW  |
| pycountry Mapping         | Converts ISO codes to human-readable language names   |
| Gradio Blocks             | Builds interactive web apps with pure Python          |

---

## πŸ”„ Intended Uses & Limitations

You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:

- Arabic (ar)  
- Bulgarian (bg)  
- German (de)  
- Modern Greek (el)  
- English (en)  
- Spanish (es)  
- French (fr)  
- Hindi (hi)  
- Italian (it)  
- Japanese (ja)  
- Dutch (nl)  
- Polish (pl)  
- Portuguese (pt)  
- Russian (ru)  
- Swahili (sw)  
- Thai (th)  
- Turkish (tr)  
- Urdu (ur)  
- Vietnamese (vi)  
- Chinese (zh)  

---