classifier-mix / README.md

noeedc

Update README.md to clarify model output format and inference example

877afd6 about 2 months ago

2.81 kB

	---
	license: apache-2.0
	tags:
	- image-classification
	- surgical
	- computer-vision
	- mobileNet
	- contaminants
	- smoke
	- medical-imaging
	- transformers
	---

	# Surgical Contaminent Classifier-Mix

	This repository contains a PyTorch-based image classifier for identifying visual contaminants in surgical footage. The model distinguishes between five classes: `blur`, `smoke`, `clear`, `fluid`, and `oob` (out-of-body). It uses a MobileNetV2 backbone via [timm](https://github.com/huggingface/pytorch-image-models), and is compatible with Hugging Face Transformers' `AutoModel` and `AutoConfig` using `trust_remote_code=True`.

	The name "classifier-mix" refers to the training data source, a mix of DaVinci and Medtronic RARP surgical frames.

	> Training log:
	> `gs://noee/mobileNet/Medtronic_28-04-2025/Run_13h20_Finetune_lr0.0001_ReduceLROnPlateau/training.log`
	>
	## Files

	- `classifier.py`: Model and config implementation.
	- `config.json`: Hugging Face model configuration.
	- `pytorch_model.bin`: Model weights.
	- `sample_img.png`: Example image for inference.
	- `example_inference.py`: Example script for running inference.

	## Usage

	### Installation

	Install required dependencies:
	```sh
	pip install torch torchvision timm transformers pillow
	```

	### Model Details

	- Backbone: MobileNetV2 (`mobilenetv2_100`)
	- Classes: blur, smoke, clear, fluid, oob
	- Input size: 224x224 RGB images
	- Normalization: mean=[0.6075, 0.4093, 0.3609], std=[0.2066, 0.2036, 0.1991]
	- Output : A list of dictionaries with :
	```python
	{
	"label": <predicted_class>, # e.g., "blur", "smoke", etc.
	"confidences": {
	"blur": 0.01,
	"smoke": 0.97,
	"clear": 0.01,
	"fluid": 0.00,
	"oob": 0.01
	}
	}

	```

	### Inference Example
	You can run the provided script :


	```python
	# example_inference.py
	from transformers import AutoModel
	from PIL import Image

	# Load model
	model = AutoModel.from_pretrained(
	"vopeai/classifier-mix",
	trust_remote_code=True
	)
	model.eval()

	# Load and preprocess image
	img = Image.open("sample_img.png").convert("RGB")

	# Run inference
	outputs = model(img)

	print("Predicted class:", outputs[0]['label'])
	print("Confidences:", outputs[0]['confidences'])
	```

	Expected output for sample image :
	<p align="center">
	<img src="sample_img.png" alt="Sample surgical frame" width="300"/>
	</p>

	```bash
	Predicted class: smoke
	Confidences: {'blur': 0.0, 'smoke': 1.0, 'clear': 0.0, 'fluid': 0.0, 'oob': 0.0}
	```

	Or use the model in your own code, by loading the model as follows :

	```python
	from transformers import AutoModel

	# Load model
	model = AutoModel.from_pretrained("vopeai/classifier-mix", trust_remote_code=True)
	```

	For more details, see the code files in this repository.