CovisPose Model

This model estimates relative poses between panoramic images using the CovisPose framework.

Model Details

Architecture: CovisPose with resnet50 backbone
Transformer Layers: 6
FFN Dimension: 2048
Input Size: [512, 1024]
Parameters: 121,890,467 (estimated)

Training Information

Configuration

Epochs: N/A
Batch Size: N/A
Learning Rate: N/A
Backbone: N/A

Performance Metrics

Final Training Loss: N/A
Training Rotation Error: N/A
Final Validation Loss: N/A
Validation Rotation Error: N/A

Usage

import torch
import json
from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(
    repo_id="SGEthan/covis_toy",
    filename="pytorch_model.bin"
)

config_path = hf_hub_download(
    repo_id="SGEthan/covis_toy",
    filename="config.json"
)

# Load configuration
with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize model (you'll need the COVIS class)
from models.covispose_model import COVIS

model = COVIS(
    backbone=config['backbone'],
    num_transformer_layers=config['num_transformer_layers'],
    transformer_ffn_dim=config['transformer_ffn_dim']
)

# Load weights
checkpoint = torch.load(model_path, map_location='cpu')
if "model_state_dict" in checkpoint:
    model.load_state_dict(checkpoint["model_state_dict"])
else:
    model.load_state_dict(checkpoint)

model.eval()

# Use for inference
with torch.no_grad():
    # Your inference code here
    # outputs1, outputs2 = model(pano1_tensor, pano2_tensor)
    pass

Model Architecture

The CovisPose model consists of:

Backbone Network: resnet50 for feature extraction
Transformer Encoder: 6 layers for processing image features
Prediction Heads:
- Covisibility mask prediction
- Relative pose estimation
- Boundary detection

Task Description

CovisPose estimates the relative pose between two panoramic images by:

Covisibility Estimation: Predicting which parts of the images overlap
Pose Regression: Estimating relative rotation and translation
Boundary Detection: Finding floor-wall boundaries for scale estimation

Training Data

This model was trained on panoramic image pairs with:

Relative pose annotations
Covisibility masks
Floor-wall boundary labels

Limitations

Designed specifically for indoor panoramic images
Requires significant visual overlap between image pairs for reliable pose estimation
Performance may degrade on outdoor scenes or images with minimal overlap

Citation

If you use this model, please cite the CovisPose work:

@article{covispose2024,
  title={CovisPose: Co-visibility Pose Estimation for Panoramic Images},
  author={Your Authors},
  journal={Conference/Journal},
  year={2024}
}

License

This model is released under the MIT License.

Repository

Training Code: Available in the original repository
Model Upload: Generated automatically from local checkpoint

Model uploaded on 2025-06-27T09:05:36.254713 using upload_model.py