Spaces:

balaji4991512
/

Text_Language_Detector

Sleeping

App Files Files Community

Text_Language_Detector / code_explanation.md

balaji4991512

Create code_explanation.md

4096757 verified 4 months ago

preview code

raw

history blame contribute delete

3.35 kB


	# 📖 Code Explanation: Text Language Detector

	This document explains the Text Language Detector app, detailing each part of the provided code and intended use cases.

	---

	## 📝 Overview

	Purpose
	Detect the language of any input text and return the full language name, ISO code, and confidence score.

	Tech Stack
	- Model: `papluca/xlm-roberta-base-language-detection` (Hugging Face Transformers)
	- Model Precision: `torch_dtype=torch.bfloat16` for reduced memory usage
	- Language Mapping: `pycountry` to convert ISO codes to full language names
	- Interface: Gradio Blocks + Buttons

	---

	## ⚙️ Setup & Dependencies

	Install required libraries:

	```bash
	pip install transformers gradio torch pycountry
	```

	---

	## 🔍 Detailed Block-by-Block Code Explanation

	```python
	import torch
	import gradio as gr
	from transformers import pipeline
	import pycountry

	# Load the language-detection pipeline with bfloat16 precision
	language_detector = pipeline(
	"text-classification",
	model="papluca/xlm-roberta-base-language-detection",
	torch_dtype=torch.bfloat16
	)

	def detect_language(text: str) -> str:
	result = language_detector(text)[0]
	code = result["label"] # e.g. "en", "ta", "fr"
	score = result["score"]

	# Map ISO code to full language name using pycountry
	try:
	lang = pycountry.languages.get(alpha_2=code).name
	except:
	lang = code.upper()

	return f"{lang} ({code}) — {score:.2f}"

	# Build the Gradio interface
	with gr.Blocks(theme=gr.themes.Default()) as demo:
	gr.Markdown("## 🌐 Text Language Detector")
	gr.Markdown("Type or paste text below to detect its language (name + code + confidence).")

	with gr.Row():
	text_input = gr.Textbox(label="📝 Input Text", placeholder="Type or paste text here...", lines=4)
	lang_output = gr.Textbox(label="✅ Detected Language", placeholder="Language & confidence", lines=1, interactive=False)

	detect_btn = gr.Button("🔍 Detect Language")
	detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output)

	gr.Markdown("---")
	gr.Markdown("Built with 🤗 Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and 🚀 Gradio")

	demo.launch()
	```

	---

	## 🚀 Core Concepts

	\| Concept \| Why It Matters \|
	\|---------------------------\|-------------------------------------------------------\|
	\| Hugging Face Pipeline \| One-line model loading & inference \|
	\| bfloat16 Precision \| Lower memory usage, faster inference on supported HW \|
	\| pycountry Mapping \| Converts ISO codes to human-readable language names \|
	\| Gradio Blocks \| Builds interactive web apps with pure Python \|

	---

	## 🔄 Intended Uses & Limitations

	You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:

	- Arabic (ar)
	- Bulgarian (bg)
	- German (de)
	- Modern Greek (el)
	- English (en)
	- Spanish (es)
	- French (fr)
	- Hindi (hi)
	- Italian (it)
	- Japanese (ja)
	- Dutch (nl)
	- Polish (pl)
	- Portuguese (pt)
	- Russian (ru)
	- Swahili (sw)
	- Thai (th)
	- Turkish (tr)
	- Urdu (ur)
	- Vietnamese (vi)
	- Chinese (zh)

	---