balaji4991512 commited on
Commit
4096757
Β·
verified Β·
1 Parent(s): 9185bc5

Create code_explanation.md

Browse files
Files changed (1) hide show
  1. code_explanation.md +116 -0
code_explanation.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # πŸ“– Code Explanation: Text Language Detector
3
+
4
+ This document explains the **Text Language Detector** app, detailing each part of the provided code and intended use cases.
5
+
6
+ ---
7
+
8
+ ## πŸ“ Overview
9
+
10
+ **Purpose**
11
+ Detect the language of any input text and return the full language name, ISO code, and confidence score.
12
+
13
+ **Tech Stack**
14
+ - **Model**: `papluca/xlm-roberta-base-language-detection` (Hugging Face Transformers)
15
+ - **Model Precision**: `torch_dtype=torch.bfloat16` for reduced memory usage
16
+ - **Language Mapping**: `pycountry` to convert ISO codes to full language names
17
+ - **Interface**: Gradio Blocks + Buttons
18
+
19
+ ---
20
+
21
+ ## βš™οΈ Setup & Dependencies
22
+
23
+ Install required libraries:
24
+
25
+ ```bash
26
+ pip install transformers gradio torch pycountry
27
+ ```
28
+
29
+ ---
30
+
31
+ ## πŸ” Detailed Block-by-Block Code Explanation
32
+
33
+ ```python
34
+ import torch
35
+ import gradio as gr
36
+ from transformers import pipeline
37
+ import pycountry
38
+
39
+ # Load the language-detection pipeline with bfloat16 precision
40
+ language_detector = pipeline(
41
+ "text-classification",
42
+ model="papluca/xlm-roberta-base-language-detection",
43
+ torch_dtype=torch.bfloat16
44
+ )
45
+
46
+ def detect_language(text: str) -> str:
47
+ result = language_detector(text)[0]
48
+ code = result["label"] # e.g. "en", "ta", "fr"
49
+ score = result["score"]
50
+
51
+ # Map ISO code to full language name using pycountry
52
+ try:
53
+ lang = pycountry.languages.get(alpha_2=code).name
54
+ except:
55
+ lang = code.upper()
56
+
57
+ return f"{lang} ({code}) β€” {score:.2f}"
58
+
59
+ # Build the Gradio interface
60
+ with gr.Blocks(theme=gr.themes.Default()) as demo:
61
+ gr.Markdown("## 🌐 Text Language Detector")
62
+ gr.Markdown("Type or paste text below to detect its language (name + code + confidence).")
63
+
64
+ with gr.Row():
65
+ text_input = gr.Textbox(label="πŸ“ Input Text", placeholder="Type or paste text here...", lines=4)
66
+ lang_output = gr.Textbox(label="βœ… Detected Language", placeholder="Language & confidence", lines=1, interactive=False)
67
+
68
+ detect_btn = gr.Button("πŸ” Detect Language")
69
+ detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output)
70
+
71
+ gr.Markdown("---")
72
+ gr.Markdown("Built with πŸ€— Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and πŸš€ Gradio")
73
+
74
+ demo.launch()
75
+ ```
76
+
77
+ ---
78
+
79
+ ## πŸš€ Core Concepts
80
+
81
+ | Concept | Why It Matters |
82
+ |---------------------------|-------------------------------------------------------|
83
+ | Hugging Face Pipeline | One-line model loading & inference |
84
+ | bfloat16 Precision | Lower memory usage, faster inference on supported HW |
85
+ | pycountry Mapping | Converts ISO codes to human-readable language names |
86
+ | Gradio Blocks | Builds interactive web apps with pure Python |
87
+
88
+ ---
89
+
90
+ ## πŸ”„ Intended Uses & Limitations
91
+
92
+ You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:
93
+
94
+ - Arabic (ar)
95
+ - Bulgarian (bg)
96
+ - German (de)
97
+ - Modern Greek (el)
98
+ - English (en)
99
+ - Spanish (es)
100
+ - French (fr)
101
+ - Hindi (hi)
102
+ - Italian (it)
103
+ - Japanese (ja)
104
+ - Dutch (nl)
105
+ - Polish (pl)
106
+ - Portuguese (pt)
107
+ - Russian (ru)
108
+ - Swahili (sw)
109
+ - Thai (th)
110
+ - Turkish (tr)
111
+ - Urdu (ur)
112
+ - Vietnamese (vi)
113
+ - Chinese (zh)
114
+
115
+ ---
116
+