MrZaper/LiteModel
MrZaper/LiteModel is a lightweight sentence-transformers model fine-tuned for semantic search and retrieval of academic articles in cybersecurity.
It maps queries and article phrases into a 384-dimensional dense vector space for similarity search, clustering, and semantic matching.
This model is specifically trained for the journal: Cybersecurity: Education, Science, Technology
Website: https://csecurity.kubg.edu.ua
What does it do?
Given a query in English, Ukrainian, or any other language, the model:
- Translates the query to English (using Google Translate).
- Encodes the query into a dense embedding using Sentence-BERT.
- Computes cosine similarity between the query embedding and precomputed article embeddings.
- Returns the top unique article codes with highest similarity scores.
Returned article codes can be viewed at:
https://csecurity.kubg.edu.ua/index.php/journal/article/view/{CODE}
For example:
560
β https://csecurity.kubg.edu.ua/index.php/journal/article/view/560
Model Files
The repository includes:
LiteModel
β SBERT-based semantic encodersbert_embeddings.npy
β Precomputed embeddings for articlessbert_labels.pkl
β Corresponding article codes (e.g.,560
,532
)
Usage (Sentence-Transformers)
Install the required package:
pip install -U sentence-transformers deep-translator huggingface-hub scikit-learn
Example usage:
from sentence_transformers import SentenceTransformer
import numpy as np
import pickle
from huggingface_hub import snapshot_download
from deep_translator import GoogleTranslator
import os
from sklearn.metrics.pairwise import cosine_similarity
# Load model and data from Hugging Face
model_name = 'MrZaper/LiteModel'
model_dir = snapshot_download(repo_id=model_name)
# Load SBERT model
sbert_model = SentenceTransformer(model_dir)
# Load precomputed article embeddings
embeddings = np.load(os.path.join(model_dir, "sbert_embeddings.npy"))
# Load article codes (labels)
with open(os.path.join(model_dir, "sbert_labels.pkl"), 'rb') as f:
labels = pickle.load(f)
def preprocess_query(query: str) -> str:
"""Translate the query to English using Google Translate."""
try:
return GoogleTranslator(source="auto", target="en").translate(query)
except Exception as e:
print(f"Translation error: {e}")
return query
def predict_semantic(query, model, embeddings, labels, top_n=5):
"""Find top-N most semantically similar unique article codes."""
query_emb = model.encode([preprocess_query(query)])
similarities = cosine_similarity(query_emb, embeddings)[0]
seen_keys = set()
results = []
# Sort results by similarity (descending)
sorted_indices = np.argsort(similarities)[::-1]
for idx in sorted_indices:
label = labels[idx]
sim = similarities[idx]
if label not in seen_keys:
seen_keys.add(label)
results.append({
"article_code": label,
"similarity": float(sim)
})
print(f"π Article {label} β similarity: {sim * 100:.2f}%")
if len(results) >= top_n:
break
return results
# Example query
query = "sql injection in websites"
results = predict_semantic(query, sbert_model, embeddings, labels)
print("\nTop article codes:")
for res in results:
print(f"Article {res['article_code']} β similarity: {res['similarity']*100:.2f}%")
Example Output
π Article 560 β similarity: 92.15%
π Article 532 β similarity: 89.34%
π Article 475 β similarity: 85.22%
Corresponding links:
https://csecurity.kubg.edu.ua/index.php/journal/article/view/560
https://csecurity.kubg.edu.ua/index.php/journal/article/view/532
https://csecurity.kubg.edu.ua/index.php/journal/article/view/475
- Downloads last month
- 8