classifier-mix / README.md
noeedc
Update README.md to clarify model output format and inference example
877afd6
---
license: apache-2.0
tags:
- image-classification
- surgical
- computer-vision
- mobileNet
- contaminants
- smoke
- medical-imaging
- transformers
---
# Surgical Contaminent Classifier-Mix
This repository contains a PyTorch-based image classifier for identifying visual contaminants in surgical footage. The model distinguishes between five classes: `blur`, `smoke`, `clear`, `fluid`, and `oob` (out-of-body). It uses a MobileNetV2 backbone via [timm](https://github.com/huggingface/pytorch-image-models), and is compatible with Hugging Face Transformers' `AutoModel` and `AutoConfig` using `trust_remote_code=True`.
The name **"classifier-mix"** refers to the training data source, a mix of DaVinci and Medtronic RARP surgical frames.
> Training log:
> `gs://noee/mobileNet/Medtronic_28-04-2025/Run_13h20_Finetune_lr0.0001_ReduceLROnPlateau/training.log`
>
## Files
- `classifier.py`: Model and config implementation.
- `config.json`: Hugging Face model configuration.
- `pytorch_model.bin`: Model weights.
- `sample_img.png`: Example image for inference.
- `example_inference.py`: Example script for running inference.
## Usage
### Installation
Install required dependencies:
```sh
pip install torch torchvision timm transformers pillow
```
### Model Details
- **Backbone:** MobileNetV2 (`mobilenetv2_100`)
- **Classes:** blur, smoke, clear, fluid, oob
- **Input size:** 224x224 RGB images
- **Normalization:** mean=[0.6075, 0.4093, 0.3609], std=[0.2066, 0.2036, 0.1991]
- **Output** : A list of dictionaries with :
```python
{
"label": <predicted_class>, # e.g., "blur", "smoke", etc.
"confidences": {
"blur": 0.01,
"smoke": 0.97,
"clear": 0.01,
"fluid": 0.00,
"oob": 0.01
}
}
```
### Inference Example
You can run the provided script :
```python
# example_inference.py
from transformers import AutoModel
from PIL import Image
# Load model
model = AutoModel.from_pretrained(
"vopeai/classifier-mix",
trust_remote_code=True
)
model.eval()
# Load and preprocess image
img = Image.open("sample_img.png").convert("RGB")
# Run inference
outputs = model(img)
print("Predicted class:", outputs[0]['label'])
print("Confidences:", outputs[0]['confidences'])
```
Expected output for sample image :
<p align="center">
<img src="sample_img.png" alt="Sample surgical frame" width="300"/>
</p>
```bash
Predicted class: smoke
Confidences: {'blur': 0.0, 'smoke': 1.0, 'clear': 0.0, 'fluid': 0.0, 'oob': 0.0}
```
Or use the model in your own code, by loading the model as follows :
```python
from transformers import AutoModel
# Load model
model = AutoModel.from_pretrained("vopeai/classifier-mix", trust_remote_code=True)
```
For more details, see the code files in this repository.