Shuu12121
/

CodeCloneDetection-ModernBERT-Owl

Sentence Similarity

sentence-transformers

dataset_size:901028

loss:CosineSimilarityLoss

text-embeddings-inference

Model card Files Files and versions

Shuu12121 commited on Apr 3

Commit

632d07c

·

verified ·

1 Parent(s): d90aaa2

Update README.md

Files changed (1) hide show

README.md +22 -1

README.md CHANGED Viewed

@@ -39,7 +39,28 @@ datasets:
 This model is a SentenceTransformer fine-tuned from [`Shuu12121/CodeModernBERT-Owl🦉`](https://huggingface.co/Shuu12121/CodeModernBERT-Owl) on the [BigCloneBench](https://huggingface.co/datasets/google/code_x_glue_cc_clone_detection_big_clone_bench) dataset for **code clone detection**. It maps code snippets into a 768-dimensional dense vector space for semantic similarity tasks.
----
 ## 📌 Model Overview

 This model is a SentenceTransformer fine-tuned from [`Shuu12121/CodeModernBERT-Owl🦉`](https://huggingface.co/Shuu12121/CodeModernBERT-Owl) on the [BigCloneBench](https://huggingface.co/datasets/google/code_x_glue_cc_clone_detection_big_clone_bench) dataset for **code clone detection**. It maps code snippets into a 768-dimensional dense vector space for semantic similarity tasks.
+## 🎯 Distinctive Performance and Stability
+This model achieves **very high accuracy and F1 scores** in code clone detection.
+One particularly noteworthy characteristic is that **changing the similarity threshold has minimal impact on classification performance**.
+This indicates that the model has learned to **clearly separate clones from non-clones**, resulting in a **stable and reliable similarity score distribution**.
+| Threshold         | Accuracy          | F1 Score           |
+|-------------------|-------------------|--------------------|
+| 0.5               | 0.9900            | 0.9633             |
+| 0.85              | 0.9903            | 0.9641             |
+| 0.90              | 0.9902            | 0.9637             |
+| 0.95              | 0.9887            | 0.9579             |
+| 0.98              | 0.9879            | 0.9540             |
+- **High Stability**: Between thresholds of 0.85 and 0.98, accuracy and F1 scores remain nearly constant.
+  _(This suggests that code pairs considered clones generally score between 0.9 and 1.0 in cosine similarity.)_
+- **Reliable in Real-World Applications**: Even if the similarity threshold is slightly adjusted for different tasks or environments, the model maintains consistent performance without significant degradation.
 ## 📌 Model Overview