🛡️ MLP Cybersecurity Classifier
This repository hosts a lightweight scikit-learn
-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
📊 Training Data
The model was trained on a multilingual dataset of cybersecurity and non-cybersecurity news articles. The dataset is publicly available on Zenodo:
🔗 https://zenodo.org/records/16417939
📦 Model Details
- Architecture:
MLPClassifier
with hidden layers(128, 64)
- Embedding model:
intfloat/multilingual-e5-large
- Input: Cleaned article (removed stopwords) or report text
- Output: Binary label (e.g.,
Cybersecurity
,Not Cybersecurity
) - Languages: English, German
🔧 Usage
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download
import joblib
# 1. Load the embedding model
embedder = SentenceTransformer("intfloat/multilingual-e5-large")
# 2. Load the pretrained MLP classifier from Hugging Face Hub
model_path = hf_hub_download(repo_id="selfconstruct3d/cybersec_classifier", filename="cybersec_classifier.pkl")
model = joblib.load(model_path)
# 3. Example input texts (can be in English or German)
texts = [
"A new ransomware attack has affected critical infrastructure in Germany.",
"The local sports club hosted its annual summer festival this weekend."
]
# 4. Generate embeddings
embeddings = embedder.encode(texts, convert_to_numpy=True, show_progress_bar=False)
# 5. Predict cybersecurity relevance
predictions = model.predict(embeddings)
# 6. Output results
for text, label in zip(texts, predictions):
print(f"Text: {text}\nPrediction: {label}\n")
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support