|
--- |
|
license: apache-2.0 |
|
tags: |
|
- image-classification |
|
- surgical |
|
- computer-vision |
|
- mobileNet |
|
- contaminants |
|
- smoke |
|
- medical-imaging |
|
- transformers |
|
--- |
|
|
|
# Surgical Contaminent Classifier-Mix |
|
|
|
This repository contains a PyTorch-based image classifier for identifying visual contaminants in surgical footage. The model distinguishes between five classes: `blur`, `smoke`, `clear`, `fluid`, and `oob` (out-of-body). It uses a MobileNetV2 backbone via [timm](https://github.com/huggingface/pytorch-image-models), and is compatible with Hugging Face Transformers' `AutoModel` and `AutoConfig` using `trust_remote_code=True`. |
|
|
|
The name **"classifier-mix"** refers to the training data source, a mix of DaVinci and Medtronic RARP surgical frames. |
|
|
|
> Training log: |
|
> `gs://noee/mobileNet/Medtronic_28-04-2025/Run_13h20_Finetune_lr0.0001_ReduceLROnPlateau/training.log` |
|
> |
|
## Files |
|
|
|
- `classifier.py`: Model and config implementation. |
|
- `config.json`: Hugging Face model configuration. |
|
- `pytorch_model.bin`: Model weights. |
|
- `sample_img.png`: Example image for inference. |
|
- `example_inference.py`: Example script for running inference. |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
Install required dependencies: |
|
```sh |
|
pip install torch torchvision timm transformers pillow |
|
``` |
|
|
|
### Model Details |
|
|
|
- **Backbone:** MobileNetV2 (`mobilenetv2_100`) |
|
- **Classes:** blur, smoke, clear, fluid, oob |
|
- **Input size:** 224x224 RGB images |
|
- **Normalization:** mean=[0.6075, 0.4093, 0.3609], std=[0.2066, 0.2036, 0.1991] |
|
- **Output** : A list of dictionaries with : |
|
```python |
|
{ |
|
"label": <predicted_class>, # e.g., "blur", "smoke", etc. |
|
"confidences": { |
|
"blur": 0.01, |
|
"smoke": 0.97, |
|
"clear": 0.01, |
|
"fluid": 0.00, |
|
"oob": 0.01 |
|
} |
|
} |
|
|
|
``` |
|
|
|
### Inference Example |
|
You can run the provided script : |
|
|
|
|
|
```python |
|
# example_inference.py |
|
from transformers import AutoModel |
|
from PIL import Image |
|
|
|
# Load model |
|
model = AutoModel.from_pretrained( |
|
"vopeai/classifier-mix", |
|
trust_remote_code=True |
|
) |
|
model.eval() |
|
|
|
# Load and preprocess image |
|
img = Image.open("sample_img.png").convert("RGB") |
|
|
|
# Run inference |
|
outputs = model(img) |
|
|
|
print("Predicted class:", outputs[0]['label']) |
|
print("Confidences:", outputs[0]['confidences']) |
|
``` |
|
|
|
Expected output for sample image : |
|
<p align="center"> |
|
<img src="sample_img.png" alt="Sample surgical frame" width="300"/> |
|
</p> |
|
|
|
```bash |
|
Predicted class: smoke |
|
Confidences: {'blur': 0.0, 'smoke': 1.0, 'clear': 0.0, 'fluid': 0.0, 'oob': 0.0} |
|
``` |
|
|
|
Or use the model in your own code, by loading the model as follows : |
|
|
|
```python |
|
from transformers import AutoModel |
|
|
|
# Load model |
|
model = AutoModel.from_pretrained("vopeai/classifier-mix", trust_remote_code=True) |
|
``` |
|
|
|
For more details, see the code files in this repository. |
|
|