File size: 2,799 Bytes
23aa901 9eb75cb 842298a f0390b2 842298a 23aa901 f0390b2 23aa901 f0390b2 23aa901 f0390b2 23aa901 842298a ecbc423 1e4cb15 69f01c7 bd1614f 842298a 23aa901 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
license: apache-2.0
---
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/hr0txL0zblj3i2cV77OYQ.png">
</p>
[π Paper](TODO) - [π€ GitHub](https://github.com/visurg-ai/surg-3m)
We provide the models used in our data curation pipeline in [π Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings](TODO) to assist with constructing the Surg-3M dataset (for more details about the Surg-3M dataset and our
SurgFM foundation model, please visit our github repository at [π€ GitHub](https://github.com/visurg-ai/surg-3m)) .
This huggingface repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py)
<div align="center">
<table style="margin-left: auto; margin-right: auto;">
<tr>
<th>Model</th>
<th>Architecture</th>
<th colspan="5">Download</th>
</tr>
<tr>
<td>Video storyboard classification models</td>
<td>ResNet-18</td>
<td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/video_storyboard_classification">Full ckpt</a></td>
</tr>
<tr>
<td>Frame classification models</td>
<td>ResNet-18</td>
<td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/frame_classification">Full ckpt</a></td>
</tr>
<tr>
<td>Non-surgical object detection models</td>
<td>Yolov8-Nano</td>
<td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/nonsurgical_object_detection">Full ckpt</a></td>
</tr>
</table>
</div>
## Usage
--------
## Video classification model
```python
import torch
from PIL import Image
from model_loader import build_model
# Load the model
net = build_model(mode='classify')
model_path = 'Video storyboard classification models'
# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cuda'))
net.load_state_dict(state['net'])
net.eval()
# Load the video storyboard and convert it to a PyTorch tensor
img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
img_tensor = torch.tensor(np.array(img)).unsqueeze(0).to('cuda')
# Extract features from the image using the ResNet50 model
outputs = net(img_tensor)
```
The video processing pipeline leading to the clean videos in the Surg-3M dataset is as follows:
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/yj2S0GMJm2C2AYwbr1p6G.png"> </img>
</div> |