File size: 2,799 Bytes
23aa901
 
 
9eb75cb
 
 
842298a
 
 
f0390b2
 
 
842298a
23aa901
 
 
 
 
 
 
 
 
 
 
f0390b2
23aa901
 
 
 
f0390b2
23aa901
 
 
 
f0390b2
23aa901
 
842298a
 
ecbc423
1e4cb15
 
69f01c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd1614f
842298a
 
23aa901
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: apache-2.0
---
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/hr0txL0zblj3i2cV77OYQ.png">
</p>

[πŸ“š Paper](TODO) - [πŸ€– GitHub](https://github.com/visurg-ai/surg-3m) 

We provide the models used in our data curation pipeline in [πŸ“š Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings](TODO) to assist with constructing the Surg-3M dataset (for more details about the Surg-3M dataset and our 
SurgFM foundation model, please visit our github repository at [πŸ€– GitHub](https://github.com/visurg-ai/surg-3m)) .
This huggingface repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py)


<div align="center">
<table style="margin-left: auto; margin-right: auto;">
  <tr>
    <th>Model</th>
    <th>Architecture</th>
    <th colspan="5">Download</th>
  </tr>
  <tr>
    <td>Video storyboard classification models</td>
    <td>ResNet-18</td>
    <td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/video_storyboard_classification">Full ckpt</a></td>
  </tr>
  <tr>
    <td>Frame classification models</td>
    <td>ResNet-18</td>
    <td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/frame_classification">Full ckpt</a></td>
  </tr>
  <tr>
    <td>Non-surgical object detection models</td>
    <td>Yolov8-Nano</td>
    <td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/nonsurgical_object_detection">Full ckpt</a></td>
  </tr>
</table>
</div>

## Usage
--------
## Video classification model

   ```python
   import torch
   from PIL import Image
   from model_loader import build_model

   # Load the model
   net = build_model(mode='classify')
   model_path = 'Video storyboard classification models'

   # Enable multi-GPU support
   net = torch.nn.DataParallel(net)
   torch.backends.cudnn.benchmark = True
   state = torch.load(model_path, map_location=torch.device('cuda'))
   net.load_state_dict(state['net'])
   net.eval()

   # Load the video storyboard and convert it to a PyTorch tensor
   img_path = 'path/to/your/image.jpg'
   img = Image.open(img_path)
   img = img.resize((224, 224))
   img_tensor = torch.tensor(np.array(img)).unsqueeze(0).to('cuda')

   # Extract features from the image using the ResNet50 model
   outputs = net(img_tensor)
   ```

The video processing pipeline leading to the clean videos in the Surg-3M dataset is as follows:
<div align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/yj2S0GMJm2C2AYwbr1p6G.png"> </img>
</div>