File size: 2,170 Bytes
7b478b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: mit
tags:
- privacy
- policy-analysis
- classification
- text-classification
- transformers
- distilbert
library_name: transformers
datasets:
- opp-115
model-index:
- name: Privacy Clause Classifier (DistilBERT - OPP-115)
  results: []
---

# Privacy Clause Classifier (DistilBERT - OPP-115)

This model is a fine-tuned DistilBERT model designed to classify **privacy policy clauses** into one of the predefined privacy practices based on the [OPP-115 dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf).

| ID | Category                        |
|----|---------------------------------|
| 0  | Data Retention                  |
| 1  | Data Security                   |
| 2  | Do Not Track                    |
| 3  | First Party Collection/Use      |
| 4  | International and Specific Audiences |
| 5  | Other                           |
| 6  | Policy Change                   |
| 7  | Third Party Sharing/Collection  |
| 8  | User Access, Edit and Deletion   |
| 9  | User Choice/Control             |

---

## Model Details

- **Architecture**: DistilBERT (pretrained)
- **Fine-tuning Dataset**: [OPP-115 Dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf)
- **Input Format**: Text snippets from privacy policies
- **Output Format**: Predicted class label with probabilities

---

## Intended Uses

- Automatic **privacy policy clause classification**
- **Regulatory technology (RegTech)** tools
- **Privacy policy summarization** and simplification
- **Risk analysis** for data sharing and collection practices

---

## How to Use

```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

# Load model
tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name")
model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name")

# Predict
text = "We may collect your location data to provide customized services."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()

print(f"Predicted Category: {predicted_class}")