🦅 Nickup Swallow (v1)
"Swallows spam, leaves the essence."
Nickup Swallow is a high-performance, multilingual text classification model specifically engineered to act as a Gatekeeper for modern AI search engines and browsers.
It filters out aggressive spam, SEO content farms, adult content, and scams, ensuring that downstream LLMs (like GPT/T5) process only high-quality, relevant data.
✨ Key Features
- 🌍 True Multilingualism: Built on
XLM-RoBERTa-Large.- Verified & Tested: English, Russian, Chinese, German, Spanish, French, Japanese.
- Supported: 100+ other languages supported by the base architecture.
- 🏎️ Blazing Fast: Optimized for low-latency inference. Ideal for real-time filtering layers.
- 💎 Exclusive Dataset: Trained on a unique, custom-parsed dataset of 230,000+ search snippets. The data was meticulously collected and labeled using Knowledge Distillation techniques specifically for this project.
- 🛡️ High Recall Philosophy: The model is tuned to be a strict filter against "Hard Spam" (Casinos, Malware, Adult) while being lenient enough to preserve valuable information.
📊 Model Performance
| Metric | Value | Notes |
|---|---|---|
| Accuracy | 89.32% | On a strict, balanced validation set |
| Training Time | ~3 hours | Trained on NVIDIA T4 (Google Colab FREE!) |
| Base Model | XLM-RoBERTa-Large | 550M params |
🧪 Examples (Real-world Tests)
The model is highly confident in distinguishing academic/technical content from low-quality spam.
| Input Text | Language | Verdict | Confidence |
|---|---|---|---|
| "Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability." | EN 🇺🇸 | ✅ Useful | 99.9% |
| "История правления Петра I: краткая биография и реформы" | RU 🇷🇺 | ✅ Useful | 97.4% |
| "Die Relativitätstheorie beschäftigt sich mit der Struktur von Raum und Zeit." | DE 🇩🇪 | ✅ Useful | 99.7% |
| "人工智能是计算机科学的一个分支。" | CN 🇨🇳 | ✅ Useful | 99.6% |
| "BUY VIAGRA!!! BEST CASINO 100% FREE SPINS CLICK HERE" | EN 🇺🇸 | 🗑️ Spam | 99.4% (Spam) |
| "СКАЧАТЬ БЕСПЛАТНО БЕЗ СМС РЕГИСТРАЦИИ КЛЮЧИ АКТИВАЦИИ" | RU 🇷🇺 | 🗑️ Spam | 97.5% (Spam) |
| "ALARGA TU PENE 5 CM EN UNA SEMANA" | ES 🇪🇸 | 🗑️ Spam | 92.3% (Spam) |
💻 Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
# Load from Hugging Face
model_name = "Nickup-Swallow-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def classify(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
probs = F.softmax(outputs.logits, dim=-1)
# Label 1 = Useful, Label 0 = Spam
spam_prob = probs[0][0].item()
useful_prob = probs[0][1].item()
return useful_prob
# Try it out
text = "Download free cracked software no virus 2024"
score = classify(text)
if score < 0.15: # Threshold can be adjusted for higher recall
print(f"⛔ Blocked (Confidence: {1-score:.2%})")
else:
print(f"✅ Allowed (Confidence: {score:.2%})")
- Downloads last month
- 12
Model tree for NickupAI/Nickup-Swallow-v1
Base model
FacebookAI/xlm-roberta-large