Shuu12121 commited on
Commit
d90aaa2
·
verified ·
1 Parent(s): edf421e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -4
README.md CHANGED
@@ -67,8 +67,8 @@ This model is a SentenceTransformer fine-tuned from [`Shuu12121/CodeModernBERT-O
67
  | Metric | Score |
68
  |---------------------------|--------------------|
69
  | Pearson Cosine (Train) | `0.9481` |
70
- | Accuracy (Test) | `0.9900` |
71
- | F1 Score (Test) | `0.9633` |
72
 
73
  ---
74
 
@@ -100,13 +100,63 @@ similarity_score = cosine_similarity(embeddings[0].unsqueeze(0), embeddings[1].u
100
 
101
  # Print the result
102
  print(f"Cosine Similarity: {similarity_score:.4f}")
103
- if similarity_score >= 0.5:
104
  print("🟢 These code snippets are considered CLONES.")
105
  else:
106
  print("🔴 These code snippets are NOT considered clones.")
107
  ```
 
 
108
 
109
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  ## 🛠️ Model Architecture
112
 
 
67
  | Metric | Score |
68
  |---------------------------|--------------------|
69
  | Pearson Cosine (Train) | `0.9481` |
70
+ | Accuracy (Test) | `0.9902` |
71
+ | F1 Score (Test) | `0.9637` |
72
 
73
  ---
74
 
 
100
 
101
  # Print the result
102
  print(f"Cosine Similarity: {similarity_score:.4f}")
103
+ if similarity_score >= 0.9:
104
  print("🟢 These code snippets are considered CLONES.")
105
  else:
106
  print("🔴 These code snippets are NOT considered clones.")
107
  ```
108
+ ## 🧪 How to Test
109
+ !pip install -U sentence-transformers datasets
110
 
111
+ from sentence_transformers import SentenceTransformer
112
+ from datasets import load_dataset
113
+ import torch
114
+ from sklearn.metrics import accuracy_score, f1_score
115
+
116
+ # --- データセットのロード ---
117
+ ds_test = load_dataset("google/code_x_glue_cc_clone_detection_big_clone_bench", split="test")
118
+
119
+ model = SentenceTransformer("Shuu12121/CodeCloneDetection-ModernBERT-Owl")
120
+ model.to("cuda")
121
+
122
+
123
+ test_sentences1 = ds_test["func1"]
124
+ test_sentences2 = ds_test["func2"]
125
+ test_labels = ds_test["label"]
126
+
127
+ batch_size = 256 # GPUメモリに合わせて調整
128
+
129
+ print("Encoding sentences1...")
130
+
131
+ embeddings1 = model.encode(
132
+ test_sentences1,
133
+ convert_to_tensor=True,
134
+ batch_size=batch_size,
135
+ show_progress_bar=True
136
+ )
137
+
138
+ print("Encoding sentences2...")
139
+ embeddings2 = model.encode(
140
+ test_sentences2,
141
+ convert_to_tensor=True,
142
+ batch_size=batch_size,
143
+ show_progress_bar=True
144
+ )
145
+
146
+ print("Calculating cosine scores...")
147
+ cosine_scores = torch.nn.functional.cosine_similarity(embeddings1, embeddings2)
148
+
149
+ # 閾値設定(ここでは0.9を採用)
150
+ threshold = 0.9
151
+ print(f"Using threshold: {threshold}")
152
+ predictions = (cosine_scores > threshold).long().cpu().numpy()
153
+
154
+ accuracy = accuracy_score(test_labels, predictions)
155
+ f1 = f1_score(test_labels, predictions)
156
+ print("Test Accuracy:", accuracy)
157
+ print("Test F1 Score:", f1)
158
+
159
+ ```
160
 
161
  ## 🛠️ Model Architecture
162