🦅 Nickup Swallow (v1)

"Swallows spam, leaves the essence."

Nickup Swallow is a high-performance, multilingual text classification model specifically engineered to act as a Gatekeeper for modern AI search engines and browsers.

It filters out aggressive spam, SEO content farms, adult content, and scams, ensuring that downstream LLMs (like GPT/T5) process only high-quality, relevant data.

✨ Key Features

  • 🌍 True Multilingualism: Built on XLM-RoBERTa-Large.
    • Verified & Tested: English, Russian, Chinese, German, Spanish, French, Japanese.
    • Supported: 100+ other languages supported by the base architecture.
  • 🏎️ Blazing Fast: Optimized for low-latency inference. Ideal for real-time filtering layers.
  • 💎 Exclusive Dataset: Trained on a unique, custom-parsed dataset of 230,000+ search snippets. The data was meticulously collected and labeled using Knowledge Distillation techniques specifically for this project.
  • 🛡️ High Recall Philosophy: The model is tuned to be a strict filter against "Hard Spam" (Casinos, Malware, Adult) while being lenient enough to preserve valuable information.

📊 Model Performance

Metric Value Notes
Accuracy 89.32% On a strict, balanced validation set
Training Time ~3 hours Trained on NVIDIA T4 (Google Colab FREE!)
Base Model XLM-RoBERTa-Large 550M params

🧪 Examples (Real-world Tests)

The model is highly confident in distinguishing academic/technical content from low-quality spam.

Input Text Language Verdict Confidence
"Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability." EN 🇺🇸 ✅ Useful 99.9%
"История правления Петра I: краткая биография и реформы" RU 🇷🇺 ✅ Useful 97.4%
"Die Relativitätstheorie beschäftigt sich mit der Struktur von Raum und Zeit." DE 🇩🇪 ✅ Useful 99.7%
"人工智能是计算机科学的一个分支。" CN 🇨🇳 ✅ Useful 99.6%
"BUY VIAGRA!!! BEST CASINO 100% FREE SPINS CLICK HERE" EN 🇺🇸 🗑️ Spam 99.4% (Spam)
"СКАЧАТЬ БЕСПЛАТНО БЕЗ СМС РЕГИСТРАЦИИ КЛЮЧИ АКТИВАЦИИ" RU 🇷🇺 🗑️ Spam 97.5% (Spam)
"ALARGA TU PENE 5 CM EN UNA SEMANA" ES 🇪🇸 🗑️ Spam 92.3% (Spam)

💻 Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F

# Load from Hugging Face
model_name = "Nickup-Swallow-v1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = F.softmax(outputs.logits, dim=-1)
    
    # Label 1 = Useful, Label 0 = Spam
    spam_prob = probs[0][0].item()
    useful_prob = probs[0][1].item()
    
    return useful_prob

# Try it out
text = "Download free cracked software no virus 2024"
score = classify(text)

if score < 0.15: # Threshold can be adjusted for higher recall
    print(f"⛔ Blocked (Confidence: {1-score:.2%})")
else:
    print(f"✅ Allowed (Confidence: {score:.2%})")
Downloads last month
12
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NickupAI/Nickup-Swallow-v1

Finetuned
(839)
this model