Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,11 @@ language:
|
|
9 |
|
10 |
This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
|
11 |
|
|
|
|
|
|
|
|
|
|
|
12 |
## 📦 Model Details
|
13 |
|
14 |
- **Architecture**: `MLPClassifier` with hidden layers `(128, 64)`
|
@@ -52,7 +57,7 @@ X_train_emb = embedder.encode(X_train.tolist(), convert_to_numpy=True, show_prog
|
|
52 |
X_test_emb = embedder.encode(X_test.tolist(), convert_to_numpy=True, show_progress_bar=True)
|
53 |
|
54 |
# Load the trained classifier
|
55 |
-
model_path = hf_hub_download(repo_id="selfconstruct3d/
|
56 |
model = joblib.load(model_path)
|
57 |
|
58 |
# Predict
|
|
|
9 |
|
10 |
This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
|
11 |
|
12 |
+
## 📊 Training Data
|
13 |
+
|
14 |
+
The model was trained on a multilingual dataset of cybersecurity and non-cybersecurity news articles. The dataset is publicly available on Zenodo:
|
15 |
+
🔗 [https://zenodo.org/records/16417939](https://zenodo.org/records/16417939)
|
16 |
+
|
17 |
## 📦 Model Details
|
18 |
|
19 |
- **Architecture**: `MLPClassifier` with hidden layers `(128, 64)`
|
|
|
57 |
X_test_emb = embedder.encode(X_test.tolist(), convert_to_numpy=True, show_progress_bar=True)
|
58 |
|
59 |
# Load the trained classifier
|
60 |
+
model_path = hf_hub_download(repo_id="selfconstruct3d/cybersec_classifier", filename="cybersec_classifier.pkl")
|
61 |
model = joblib.load(model_path)
|
62 |
|
63 |
# Predict
|