# 📖 Code Explanation: Text Language Detector This document explains the **Text Language Detector** app, detailing each part of the provided code and intended use cases. --- ## 📝 Overview **Purpose** Detect the language of any input text and return the full language name, ISO code, and confidence score. **Tech Stack** - **Model**: `papluca/xlm-roberta-base-language-detection` (Hugging Face Transformers) - **Model Precision**: `torch_dtype=torch.bfloat16` for reduced memory usage - **Language Mapping**: `pycountry` to convert ISO codes to full language names - **Interface**: Gradio Blocks + Buttons --- ## ⚙️ Setup & Dependencies Install required libraries: ```bash pip install transformers gradio torch pycountry ``` --- ## 🔍 Detailed Block-by-Block Code Explanation ```python import torch import gradio as gr from transformers import pipeline import pycountry # Load the language-detection pipeline with bfloat16 precision language_detector = pipeline( "text-classification", model="papluca/xlm-roberta-base-language-detection", torch_dtype=torch.bfloat16 ) def detect_language(text: str) -> str: result = language_detector(text)[0] code = result["label"] # e.g. "en", "ta", "fr" score = result["score"] # Map ISO code to full language name using pycountry try: lang = pycountry.languages.get(alpha_2=code).name except: lang = code.upper() return f"{lang} ({code}) — {score:.2f}" # Build the Gradio interface with gr.Blocks(theme=gr.themes.Default()) as demo: gr.Markdown("## 🌐 Text Language Detector") gr.Markdown("Type or paste text below to detect its language (name + code + confidence).") with gr.Row(): text_input = gr.Textbox(label="📝 Input Text", placeholder="Type or paste text here...", lines=4) lang_output = gr.Textbox(label="✅ Detected Language", placeholder="Language & confidence", lines=1, interactive=False) detect_btn = gr.Button("🔍 Detect Language") detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output) gr.Markdown("---") gr.Markdown("Built with 🤗 Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and 🚀 Gradio") demo.launch() ``` --- ## 🚀 Core Concepts | Concept | Why It Matters | |---------------------------|-------------------------------------------------------| | Hugging Face Pipeline | One-line model loading & inference | | bfloat16 Precision | Lower memory usage, faster inference on supported HW | | pycountry Mapping | Converts ISO codes to human-readable language names | | Gradio Blocks | Builds interactive web apps with pure Python | --- ## 🔄 Intended Uses & Limitations You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages: - Arabic (ar) - Bulgarian (bg) - German (de) - Modern Greek (el) - English (en) - Spanish (es) - French (fr) - Hindi (hi) - Italian (it) - Japanese (ja) - Dutch (nl) - Polish (pl) - Portuguese (pt) - Russian (ru) - Swahili (sw) - Thai (th) - Turkish (tr) - Urdu (ur) - Vietnamese (vi) - Chinese (zh) ---