README.md · visurg/LEMON_curation_models at 1e4cb15756211b285061736d2c2a954ca9f27918

metadata

license: apache-2.0

📚 Paper - 🤖 GitHub

We provide the models used in our data curation pipeline in 📚 Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings to assist with constructing the Surg-3M dataset (for more details about the Surg-3M dataset and our SurgFM foundation model, please visit our github repository at 🤖 GitHub) . This huggingface repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at model_loader.py

Model	Architecture	Download
Video storyboard classification models	ResNet-18	Full ckpt
Frame classification models	ResNet-18	Full ckpt
Non-surgical object detection models	Yolov8-Nano	Full ckpt

Usage

Video classification model

import torch
from PIL import Image
from model_loader import build_model

# Load the model
net = build_model(mode='classify')
model_path = 'Video storyboard classification models'

# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cuda'))
net.load_state_dict(state['net'])
net.eval()

# Load the video storyboard and convert it to a PyTorch tensor
img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
img_tensor = torch.tensor(np.array(img)).unsqueeze(0).to('cuda')

# Extract features from the image using the ResNet50 model
outputs = net(img_tensor)

The video processing pipeline leading to the clean videos in the Surg-3M dataset is as follows: