Spaces:

balaji4991512
/

Text_Language_Detector

Sleeping

App Files Files Community

balaji4991512 commited on 16 days ago

Commit

4096757

verified ·

1 Parent(s): 9185bc5

Create code_explanation.md

Browse files

Files changed (1) hide show

code_explanation.md +116 -0

code_explanation.md ADDED Viewed

	@@ -0,0 +1,116 @@

+# 📖 Code Explanation: Text Language Detector
+This document explains the **Text Language Detector** app, detailing each part of the provided code and intended use cases.
+---
+## 📝 Overview
+**Purpose**
+Detect the language of any input text and return the full language name, ISO code, and confidence score.
+**Tech Stack**
+- **Model**: `papluca/xlm-roberta-base-language-detection` (Hugging Face Transformers)
+- **Model Precision**: `torch_dtype=torch.bfloat16` for reduced memory usage
+- **Language Mapping**: `pycountry` to convert ISO codes to full language names
+- **Interface**: Gradio Blocks + Buttons
+---
+## ⚙️ Setup & Dependencies
+Install required libraries:
+```bash
+pip install transformers gradio torch pycountry
+```
+---
+## 🔍 Detailed Block-by-Block Code Explanation
+```python
+import torch
+import gradio as gr
+from transformers import pipeline
+import pycountry
+# Load the language-detection pipeline with bfloat16 precision
+language_detector = pipeline(
+    "text-classification",
+    model="papluca/xlm-roberta-base-language-detection",
+    torch_dtype=torch.bfloat16
+)
+def detect_language(text: str) -> str:
+    result = language_detector(text)[0]
+    code = result["label"]      # e.g. "en", "ta", "fr"
+    score = result["score"]
+    # Map ISO code to full language name using pycountry
+    try:
+        lang = pycountry.languages.get(alpha_2=code).name
+    except:
+        lang = code.upper()
+    return f"{lang} ({code}) — {score:.2f}"
+# Build the Gradio interface
+with gr.Blocks(theme=gr.themes.Default()) as demo:
+    gr.Markdown("## 🌐 Text Language Detector")
+    gr.Markdown("Type or paste text below to detect its language (name + code + confidence).")
+    with gr.Row():
+        text_input = gr.Textbox(label="📝 Input Text", placeholder="Type or paste text here...", lines=4)
+        lang_output = gr.Textbox(label="✅ Detected Language", placeholder="Language & confidence", lines=1, interactive=False)
+    detect_btn = gr.Button("🔍 Detect Language")
+    detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output)
+    gr.Markdown("---")
+    gr.Markdown("Built with 🤗 Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and 🚀 Gradio")
+demo.launch()
+```
+---
+## 🚀 Core Concepts
+| Concept                   | Why It Matters                                        |
+|---------------------------|-------------------------------------------------------|
+| Hugging Face Pipeline     | One-line model loading & inference                    |
+| bfloat16 Precision        | Lower memory usage, faster inference on supported HW  |
+| pycountry Mapping         | Converts ISO codes to human-readable language names   |
+| Gradio Blocks             | Builds interactive web apps with pure Python          |
+---
+## 🔄 Intended Uses & Limitations
+You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:
+- Arabic (ar)
+- Bulgarian (bg)
+- German (de)
+- Modern Greek (el)
+- English (en)
+- Spanish (es)
+- French (fr)
+- Hindi (hi)
+- Italian (it)
+- Japanese (ja)
+- Dutch (nl)
+- Polish (pl)
+- Portuguese (pt)
+- Russian (ru)
+- Swahili (sw)
+- Thai (th)
+- Turkish (tr)
+- Urdu (ur)
+- Vietnamese (vi)
+- Chinese (zh)
+---