File size: 3,797 Bytes
6e9666f
 
a7a1c0f
 
 
4bead34
a7a1c0f
 
 
 
 
 
 
 
39d5a87
78ee727
 
 
5f7b4c4
 
8d863df
39d5a87
143a450
 
 
39d5a87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f7b4c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78ee727
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: apache-2.0
language:
- en
base_model:
- google/siglip2-base-patch16-256
pipeline_tag: image-classification
library_name: transformers
tags:
- siglip2
- '256'
- patch16
- adult-content-detection
- explicit-content-detection
---

![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/_3_jghGne_Ezr3VdhrSsP.png)

# **siglip2-x256-explicit-content**

> **siglip2-x256-explicit-content** is a vision-language encoder model fine-tuned from **siglip2-base-patch16-256** for **multi-class image classification**. Built on the **SiglipForImageClassification** architecture, the model is trained to identify and categorize content types in images, especially for **explicit, suggestive, or safe media filtering**.

> [!note]
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786

```py
Classification Report:
                     precision    recall  f1-score   support

      Anime Picture     0.8940    0.8718    0.8827      5600
             Hentai     0.8961    0.8935    0.8948      4180
             Normal     0.9100    0.8895    0.8997      5503
        Pornography     0.9496    0.9654    0.9574      5600
Enticing or Sensual     0.9132    0.9429    0.9278      5600

           accuracy                         0.9137     26483
          macro avg     0.9126    0.9126    0.9125     26483
       weighted avg     0.9135    0.9137    0.9135     26483
```

![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/psonZ0OXSjqgLRDkFtRTh.png)

---

## **Label Space: 5 Classes**

The model classifies each image into one of the following content categories:

```
Class 0: "Anime Picture"  
Class 1: "Hentai"  
Class 2: "Normal"  
Class 3: "Pornography"  
Class 4: "Enticing or Sensual"
```

---

## **Install Dependencies**

```bash
pip install -q transformers torch pillow gradio
```

---

## **Inference Code**

```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/siglip2-x256-explicit-content"  # Replace with your model path if needed
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# ID to Label mapping
id2label = {
    "0": "Anime Picture",
    "1": "Hentai",
    "2": "Normal",
    "3": "Pornography",
    "4": "Enticing or Sensual"
}

def classify_explicit_content(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_explicit_content,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=5, label="Predicted Content Type"),
    title="siglip2-x256-explicit-content",
    description="Classifies images into explicit, suggestive, or safe categories (e.g., Hentai, Pornography, Normal)."
)

if __name__ == "__main__":
    iface.launch()
```

---

## **Intended Use**

This model is intended for applications such as:

- **Content Moderation**: Automatically detect NSFW or suggestive content.
- **Parental Controls**: Enable AI-based filtering for safe media browsing.
- **Dataset Preprocessing**: Clean and categorize image datasets for research or deployment.
- **Online Platforms**: Help enforce content guidelines for uploads and user-generated media.