CovisPose Model

This model estimates relative poses between panoramic images using the CovisPose framework.

Model Details

  • Architecture: CovisPose with resnet50 backbone
  • Transformer Layers: 6
  • FFN Dimension: 2048
  • Input Size: [512, 1024]
  • Parameters: 121,890,467 (estimated)

Training Information

Configuration

  • Epochs: N/A
  • Batch Size: N/A
  • Learning Rate: N/A
  • Backbone: N/A

Performance Metrics

  • Final Training Loss: N/A
  • Training Rotation Error: N/A
  • Final Validation Loss: N/A
  • Validation Rotation Error: N/A

Usage

import torch
import json
from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(
    repo_id="SGEthan/covis_toy",
    filename="pytorch_model.bin"
)

config_path = hf_hub_download(
    repo_id="SGEthan/covis_toy",
    filename="config.json"
)

# Load configuration
with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize model (you'll need the COVIS class)
from models.covispose_model import COVIS

model = COVIS(
    backbone=config['backbone'],
    num_transformer_layers=config['num_transformer_layers'],
    transformer_ffn_dim=config['transformer_ffn_dim']
)

# Load weights
checkpoint = torch.load(model_path, map_location='cpu')
if "model_state_dict" in checkpoint:
    model.load_state_dict(checkpoint["model_state_dict"])
else:
    model.load_state_dict(checkpoint)

model.eval()

# Use for inference
with torch.no_grad():
    # Your inference code here
    # outputs1, outputs2 = model(pano1_tensor, pano2_tensor)
    pass

Model Architecture

The CovisPose model consists of:

  1. Backbone Network: resnet50 for feature extraction
  2. Transformer Encoder: 6 layers for processing image features
  3. Prediction Heads:
    • Covisibility mask prediction
    • Relative pose estimation
    • Boundary detection

Task Description

CovisPose estimates the relative pose between two panoramic images by:

  1. Covisibility Estimation: Predicting which parts of the images overlap
  2. Pose Regression: Estimating relative rotation and translation
  3. Boundary Detection: Finding floor-wall boundaries for scale estimation

Training Data

This model was trained on panoramic image pairs with:

  • Relative pose annotations
  • Covisibility masks
  • Floor-wall boundary labels

Limitations

  • Designed specifically for indoor panoramic images
  • Requires significant visual overlap between image pairs for reliable pose estimation
  • Performance may degrade on outdoor scenes or images with minimal overlap

Citation

If you use this model, please cite the CovisPose work:

@article{covispose2024,
  title={CovisPose: Co-visibility Pose Estimation for Panoramic Images},
  author={Your Authors},
  journal={Conference/Journal},
  year={2024}
}

License

This model is released under the MIT License.

Repository

  • Training Code: Available in the original repository
  • Model Upload: Generated automatically from local checkpoint

Model uploaded on 2025-06-27T09:05:36.254713 using upload_model.py

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support