Spaces:

cherubicxn
/

ScaleLSD

Runtime error

App Files Files Community

Nan Xue commited on Jun 13

Commit

4c954ae

1 Parent(s): 3132f36

update

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

LEGAL.md +0 -0
LICENSE +21 -0
README.md +117 -12
gradio_demo/inference.py +252 -0
gradio_demo/line_mat_gluestick.py +386 -0
line_matching/run.py +191 -0
line_matching/run_list.py +144 -0
line_matching/two_view_pipeline.py +167 -0
line_matching/wireframe.py +341 -0
predictor/predict.py +131 -0
requirements.txt +21 -0
scalelsd/.gitignore +10 -0
scalelsd/__init__.py +2 -0
scalelsd/base/__init__.py +13 -0
scalelsd/base/csrc/__init__.py +19 -0
scalelsd/base/csrc/binding.cpp +5 -0
scalelsd/base/csrc/linesegment.cu +139 -0
scalelsd/base/csrc/linesegment.h +26 -0
scalelsd/base/show/__init__.py +3 -0
scalelsd/base/show/canvas.py +153 -0
scalelsd/base/show/cli.py +24 -0
scalelsd/base/show/painters.py +80 -0
scalelsd/base/utils/__init__.py +1 -0
scalelsd/base/utils/logger.py +30 -0
scalelsd/base/utils/metric_logger.py +77 -0
scalelsd/base/wireframe.py +110 -0
scalelsd/encoder/__init__.py +1 -0
scalelsd/encoder/hafm.py +152 -0
scalelsd/ssl/backbones/__init__.py +1 -0
scalelsd/ssl/backbones/build.py +28 -0
scalelsd/ssl/backbones/dpt/__init__.py +0 -0
scalelsd/ssl/backbones/dpt/base_model.py +16 -0
scalelsd/ssl/backbones/dpt/blocks.py +388 -0
scalelsd/ssl/backbones/dpt/midas_net.py +77 -0
scalelsd/ssl/backbones/dpt/models.py +115 -0
scalelsd/ssl/backbones/dpt/transforms.py +231 -0
scalelsd/ssl/backbones/dpt/vit.py +586 -0
scalelsd/ssl/backbones/multi_task_head.py +52 -0
scalelsd/ssl/config/__init__.py +2 -0
scalelsd/ssl/config/dataset/hpatches_dataset.yaml +105 -0
scalelsd/ssl/config/dataset/nyu_dataset.yaml +77 -0
scalelsd/ssl/config/dataset/official_yorkurban_dataset.yaml +75 -0
scalelsd/ssl/config/dataset/rdnim_dataset.yaml +77 -0
scalelsd/ssl/config/dataset/synthetic_dataset-1024.yaml +49 -0
scalelsd/ssl/config/dataset/synthetic_dataset-2k.yaml +50 -0
scalelsd/ssl/config/dataset/synthetic_dataset-4k.yaml +50 -0
scalelsd/ssl/config/dataset/synthetic_dataset-large.yaml +50 -0
scalelsd/ssl/config/dataset/synthetic_dataset.yaml +51 -0
scalelsd/ssl/config/dataset/wireframe_official_gt copy.yaml +86 -0
scalelsd/ssl/config/dataset/wireframe_official_gt.yaml +86 -0

LEGAL.md ADDED Viewed

File without changes

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Nan Xue
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,117 @@
----
-title: ScaleLSD
-emoji: 🌍
-colorFrom: indigo
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.33.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+# ScaleLSD: Scalable Deep Line Segment Detection Streamlined
+<!-- <a href="https://code.alipay.com/kezeran.kzr/ScaleLSD"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>&ensp;<a href="https://code.alipay.com/kezeran.kzr/ScaleLSD"><img src="https://img.shields.io/badge/ArXiv-250x.xxxxx-brightgreen"></a>&ensp;<a href="https://code.alipay.com/kezeran.kzr/ScaleLSD"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model_Card-Huggingface-orange"></a>&ensp;<a href="https://code.alipay.com/kezeran.kzr/ScaleLSD"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange"></a> -->
+<a href="https://ant-research.github.io/scalelsd"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>&ensp;<a href="https://arxiv.org/abs/2506.09369"><img src="https://img.shields.io/badge/ArXiv-2506.09369-brightgreen"></a>&ensp;<a href="https://huggingface.co/cherubicxn/scalelsd"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model_Card-Huggingface-orange"></a>
+[Zeran Ke](https://calmke.github.io/)<sup>1,2</sup>, [Bin Tan](https://icetttb.github.io/)<sup>2</sup>, [Xianwei Zheng](https://jszy.whu.edu.cn/zhengxianwei/zh_CN/index.htm)<sup>1</sup>,  [Yujun Shen](https://shenyujun.github.io/)<sup>2</sup>, [Tianfu Wu](https://research.ece.ncsu.edu/ivmcl/)<sup>3</sup>, [Nan Xue](https://xuenan.net/)<sup>2†</sup>
+<sup>1</sup>Wuhan University &ensp;&ensp;<sup>2</sup>Ant Group&ensp;&ensp;<sup>3</sup>NC State University
+</div>
+<!-- <img src="assets/teaser.jpg" width="100%"> -->
+![teaser](assets/teaser.jpg)
+## ⚙️ Installtion
+All codes are succefully tested on:
+- Ubuntu 22.04.5 LTS
+- CUDA 12.1
+- Python 3.10
+- Pytorch 2.5.1
+First clone this repo:
+```bash
+git clone https://github.com/ant-research/scalelsd.git
+```
+Then create the conda eviroment and install the dependencies:
+```bash
+conda create -n scalelsd python=3.10
+pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
+pip install -r requirements.txt
+pip install -e .  # Install scalelsd locally
+```
+## 🔥🔍 Gradio Demo
+### Line Segment Detection
+Before you started, please download our pre-trained [models](https://huggingface.co/cherubicxn/scalelsd) and place them into the `models` folder. Then run the Gradio demo:
+```bash
+python -m gradio_demo.inference
+```
+### Line Matching
+Because our line matching app is built on GlueStick with our ScaleLSD, you need to install [GlueStick](https://github.com/cvg/GlueStick) and download the weights of the GlueStick model. Then run the Gradio demo:
+```bash
+pythonb -m gradio_demo.line_mat_gluestick
+```
+## 🚗 Inference
+Quickly start use our models for line segment detection by running the following command:
+```bash
+python -m predictor.predict --img $[IMAGE_PATH_OR_FODER]
+```
+You can also specify more params by:
+```bash
+python -m predictor.predict \
+    --ckpt $[MODEL_PATH] \
+    --img $[IMAGE_PATH_OR_FODER] \
+    --ext $[png/pdf/json] \
+    --threshold 10 \
+    --junction-hm 0.1 \
+    --disable-show
+```
+```bash
+OPTIONS:
+  --ckpt CKPT, -c CKPT
+                        Path to the checkpoint file.
+  --img IMG, -i IMG     Path to the image or folder containing images.
+  --ext EXT, -e EXT     Output file extension (png/pdf/json).
+  --threshold THRESHOLD, -t THRESHOLD
+                        Threshold for line segment detection.
+  --junction-hm JUNCTION_HM, -jh JUNCTION_HM
+                        Junction heatmap threshold.
+  --num-junctions NUM_JUNCTIONS, -nj NUM_JUNCTIONS
+                        Max number of junctions to detect.
+  --disable-show        Disable showing the results.
+  --use_lsd             Use LSD-Rectifier for line segment detection.
+  --use_nms             Use Non-Maximum Suppression (NMS) for junction detection.
+```
+## 📖 Related Third-party Projects
+- [HAWPv3](https://github.com/cherubicXN/hawp/tree/main)
+- [DeepLSD](https://github.com/cvg/DeepLSD)
+- [Progressive-x](https://github.com/danini/progressive-x/tree/vanishing-points)
+- [GlueStick](https://github.com/cvg/GlueStick)
+- [GlueFactory](https://github.com/cvg/glue-factory)
+- [LiMAP](https://github.com/cvg/limap)
+## 📝 Citation
+If you find our work useful in your research, please consider citing:
+```bash
+@inproceedings{ScaleLSD,
+    title = {ScaleLSD: Scalable Deep Line Segment Detection Streamlined},
+    author = {Zeran Ke and Bin Tan and Xianwei Zheng and Yujun Shen and Tianfu Wu and Nan Xue},
+    booktitle = "IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
+    year = {2025},
+}
+```

gradio_demo/inference.py ADDED Viewed

	@@ -0,0 +1,252 @@

+import torch
+import cv2
+import os
+import gradio as gr
+import numpy as np
+import random
+from pathlib import Path
+import json
+from scalelsd.ssl.models.detector import ScaleLSD
+from scalelsd.base import show, WireframeGraph
+from scalelsd.ssl.misc.train_utils import fix_seeds, load_scalelsd_model
+# Title for the Gradio interface
+_TITLE = 'Gradio Demo of ScaleLSD for Structured Representation of Images'
+MAX_SEED = 1000
+def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
+    """random seed"""
+    if randomize_seed:
+        seed = random.randint(0, MAX_SEED)
+    return seed
+def stop_run():
+    """stop run"""
+    return (
+        gr.update(value="Run", variant="primary", visible=True),
+        gr.update(visible=False),
+    )
+def process_image(
+    input_image,
+    model_name='scalelsd-vitbase-v2-train-sa1b.pt',
+    save_name='temp_output',
+    threshold=10,
+    junction_threshold_hm=0.008,
+    num_junctions_inference=512,
+    width=512,
+    height=512,
+    line_width=2,
+    juncs_size=4,
+    whitebg=0.0,
+    draw_junctions_only=False,
+    use_lsd=False,
+    use_nms=False,
+    edge_color='orange',
+    vertex_color='Cyan',
+    output_format='png',
+    seed=0,
+    randomize_seed=False
+):
+    """core processing function for image inference"""
+    # set random seed
+    seed = int(randomize_seed_fn(seed, randomize_seed))
+    fix_seeds(seed)
+    # initialize model
+    ckpt = "models/" + model_name
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model = load_scalelsd_model(ckpt, device)
+    # set model parameters
+    model.junction_threshold_hm = junction_threshold_hm
+    model.num_junctions_inference = num_junctions_inference
+    # transform input image
+    if isinstance(input_image, np.ndarray):
+        image = cv2.cvtColor(input_image, cv2.COLOR_RGB2GRAY)
+    else:
+        image = cv2.imread(input_image, 0)
+    # resize
+    ori_shape = image.shape[:2]
+    image_resized = cv2.resize(image.copy(), (width, height))
+    image_tensor = torch.from_numpy(image_resized).float() / 255.0
+    image_tensor = image_tensor[None, None].to('cuda')
+    # meta data
+    meta = {
+        'width': ori_shape[1],
+        'height': ori_shape[0],
+        'filename': '',
+        'use_lsd': use_lsd,
+        'use_nms': use_nms,
+    }
+    # inference
+    with torch.no_grad():
+        outputs, _ = model(image_tensor, meta)
+        outputs = outputs[0]
+    # visual results
+    painter = show.painters.HAWPainter()
+    painter.confidence_threshold = threshold
+    painter.line_width = line_width
+    painter.marker_size = juncs_size
+    if whitebg > 0.0:
+        show.Canvas.white_overlay = whitebg
+    temp_folder = "temp_output"
+    os.makedirs(temp_folder, exist_ok=True)
+    fig_file = f"{temp_folder}/{save_name}.png"
+    with show.image_canvas(input_image, fig_file=fig_file) as ax:
+        if draw_junctions_only:
+            painter.draw_junctions(ax, outputs)
+        else:
+            painter.draw_wireframe(ax, outputs, edge_color=edge_color, vertex_color=vertex_color)
+    # read the result image
+    result_image = cv2.imread(fig_file)
+    if output_format != 'png':
+        fig_file = f"{temp_folder}/{save_name}.{output_format}"
+        with show.image_canvas(input_image, fig_file=fig_file) as ax:
+            if draw_junctions_only:
+                painter.draw_junctions(ax, outputs)
+            else:
+                painter.draw_wireframe(ax, outputs, edge_color=edge_color, vertex_color=vertex_color)
+    json_file = f"{temp_folder}/{save_name}.json"
+    indices = WireframeGraph.xyxy2indices(outputs['juncs_pred'],outputs['lines_pred'])
+    wireframe = WireframeGraph(outputs['juncs_pred'], outputs['juncs_score'], indices, outputs['lines_score'], outputs['width'], outputs['height'])
+    with open(json_file, 'w') as f:
+        json.dump(wireframe.jsonize(),f)
+    return result_image[:, :, ::-1], json_file, fig_file
+def run_demo():
+    """create the Gradio demo interface"""
+    css = """
+    #col-container {
+        margin: 0 auto;
+        max-width: 800px;
+    }
+    """
+    with gr.Blocks(css=css, title=_TITLE) as demo:
+        with gr.Column(elem_id="col-container"):
+            gr.Markdown(f'# {_TITLE}')
+            gr.Markdown("Detect wireframe structures in images using ScaleLSD model")
+            pid = gr.State()
+            figs_root = "assets/figs"
+            example_images = [os.path.join(figs_root, iname) for iname in os.listdir(figs_root)]
+            with gr.Row():
+                input_image = gr.Image(example_images[0], label="Input Image", type="numpy")
+                output_image = gr.Image(label="Detection Result")
+            with gr.Row():
+                run_btn = gr.Button(value="Run", variant="primary")
+                stop_btn = gr.Button(value="Stop", variant="stop", visible=False)
+            with gr.Row():
+                json_file = gr.File(label="Download JSON Output", type="filepath")
+                image_file = gr.File(label="Download Image Output", type="filepath")
+            with gr.Accordion("Advanced Settings", open=True):
+                with gr.Row():
+                    model_name = gr.Dropdown(
+                        [ckpt for ckpt in os.listdir('models') if ckpt.endswith('.pt')],
+                        value='scalelsd-vitbase-v1-train-sa1b.pt',
+                        label="Model Selection"
+                    )
+                with gr.Row():
+                    save_name = gr.Textbox('temp_output', label="Save Name", placeholder="Name for saving output files")
+                with gr.Row():
+                    with gr.Column():
+                        threshold = gr.Number(10, label="Line Threshold")
+                        junction_threshold_hm = gr.Number(0.008, label="Junction Threshold")
+                        num_junctions_inference = gr.Number(1024, label="Max Number of Junctions")
+                        width = gr.Number(512, label="Input Width")
+                        height = gr.Number(512, label="Input Height")
+                    with gr.Column():
+                        draw_junctions_only = gr.Checkbox(False, label="Show Junctions Only")
+                        use_lsd = gr.Checkbox(False, label="Use LSD-Rectifier")
+                        use_nms = gr.Checkbox(True, label="Use NMS")
+                        output_format = gr.Dropdown(
+                            ['png', 'jpg', 'pdf'],
+                            value='png',
+                            label="Output Format"
+                        )
+                        whitebg = gr.Slider(0.0, 1.0, value=0.7, label="White Background Opacity")
+                        line_width = gr.Number(2, label="Line Width")
+                        juncs_size = gr.Number(8, label="Junctions Size")
+                with gr.Row():
+                    edge_color = gr.Dropdown(
+                        ['orange', 'midnightblue', 'red', 'green'],
+                        value='orange',
+                        label="Edge Color"
+                    )
+                    vertex_color = gr.Dropdown(
+                        ['Cyan', 'deeppink', 'yellow', 'purple'],
+                        value='Cyan',
+                        label="Vertex Color"
+                    )
+                with gr.Row():
+                    randomize_seed = gr.Checkbox(False, label="Randomize Seed")
+                    seed = gr.Slider(0, MAX_SEED, value=42, step=1, label="Seed")
+            gr.Examples(
+                examples=example_images,
+                inputs=input_image,
+            )
+            # star event handlers
+            run_event = run_btn.click(
+                fn=process_image,
+                inputs=[
+                    input_image,
+                    model_name,
+                    save_name,
+                    threshold,
+                    junction_threshold_hm,
+                    num_junctions_inference,
+                    width,
+                    height,
+                    line_width,
+                    juncs_size,
+                    whitebg,
+                    draw_junctions_only,
+                    use_lsd,
+                    use_nms,
+                    edge_color,
+                    vertex_color,
+                    output_format,
+                    seed,
+                    randomize_seed
+                ],
+                outputs=[output_image, json_file, image_file],
+            )
+            # stop event handlers
+            stop_btn.click(
+                fn=stop_run,
+                outputs=[run_btn, stop_btn],
+                cancels=[run_event],
+                queue=False,
+            )
+    return demo
+if __name__ == "__main__":
+    run_demo().launch()

gradio_demo/line_mat_gluestick.py ADDED Viewed

	@@ -0,0 +1,386 @@

+import argparse
+import os
+from os.path import join
+import sys
+import numpy as np
+import cv2
+import torch
+from matplotlib import pyplot as plt
+from tqdm import tqdm
+import gradio as gr
+import random
+from gluestick import batch_to_np, numpy_image_to_torch, GLUESTICK_ROOT
+from gluestick.drawing import plot_images, plot_lines, plot_color_line_matches, plot_keypoints, plot_matches
+from scalelsd.ssl.models.detector import ScaleLSD
+from scalelsd.base import show, WireframeGraph
+from scalelsd.ssl.datasets.transforms.homographic_transforms import sample_homography
+from scalelsd.ssl.misc.train_utils import fix_seeds
+from line_matching.two_view_pipeline import TwoViewPipeline
+from kornia.geometry import warp_perspective,transform_points
+class HADConfig:
+    num_iter = 1
+    valid_border_margin = 3
+    translation = True
+    rotation = True
+    scale = True
+    perspective = True
+    scaling_amplitude = 0.2
+    perspective_amplitude_x = 0.2
+    perspective_amplitude_y = 0.2
+    allow_artifacts = False
+    patch_ratio = 0.85
+had_cfg = HADConfig()
+# Evaluation config
+default_conf = {
+    'name': 'two_view_pipeline',
+    'use_lines': True,
+    'extractor': {
+        'name': 'wireframe',
+        'sp_params': {
+            'force_num_keypoints': False,
+            'max_num_keypoints': 2048,
+        },
+        'wireframe_params': {
+            'merge_points': True,
+            'merge_line_endpoints': True,
+            # 'merge_line_endpoints': False,
+        },
+        'max_n_lines': 512,
+    },
+    'matcher': {
+        'name': 'gluestick',
+        'weights': str(GLUESTICK_ROOT / 'resources' / 'weights' / 'checkpoint_GlueStick_MD.tar'),
+        'trainable': False,
+    },
+    'ground_truth': {
+        'from_pose_depth': False,
+    }
+}
+# Title for the Gradio interface
+_TITLE = 'ScaleLSD-GlueStick Line Matching'
+MAX_SEED = 1000
+def sample_homographics(height, width):
+    def scale_homography(H, stride):
+        H_scaled = H.clone()
+        H_scaled[:, :, 2, :2] *= stride
+        H_scaled[:, :, :2, 2] /= stride
+        return H_scaled
+    homographic = sample_homography(
+        shape = (height, width),
+        perspective = had_cfg.perspective,
+        scaling = had_cfg.scale,
+        rotation = had_cfg.rotation,
+        translation = had_cfg.translation,
+        scaling_amplitude = had_cfg.scaling_amplitude,
+        perspective_amplitude_x = had_cfg.perspective_amplitude_x,
+        perspective_amplitude_y = had_cfg.perspective_amplitude_y,
+        patch_ratio = had_cfg.patch_ratio,
+        allow_artifacts = False
+        )[0]
+    homographic = torch.from_numpy(homographic[None]).float().cuda()
+    homographic_inv = torch.inverse(homographic)
+    H = {
+        'h.1': homographic,
+        'ih.1': homographic_inv,
+    }
+    return H
+def trans_image_with_homograpy(image):
+    h, w = image.shape[:2]
+    H = sample_homographics(height=h, width=w)
+    image_warped = warp_perspective(torch.Tensor(image).permute(2,0,1)[None].cuda(), H['h.1'], (h,w))
+    image_warped_ = image_warped[0].permute(1,2,0).cpu().numpy().astype(np.uint8)
+    plt.imshow(image_warped_)
+    plt.show()
+    return image_warped_
+def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
+    """random seed"""
+    if randomize_seed:
+        seed = random.randint(0, MAX_SEED)
+    return seed
+def stop_run():
+    """stop run"""
+    return (
+        gr.update(value="Run", variant="primary", visible=True),
+        gr.update(visible=False),
+    )
+def clear_image2():
+    return None  # returning None will clear the image component
+def process_image(
+    input_image1='assets/figs/sa_1119229.jpg',
+    input_image2=None,
+    model_name='scalelsd-vitbase-v1-train-sa1b.pt',
+    save_name='temp',
+    threshold=5,
+    junction_threshold_hm=0.008,
+    num_junctions_inference=4096,
+    width=512,
+    height=512,
+    line_width=2,
+    juncs_size=4,
+    whitebg=1.0,
+    draw_junctions_only=False,
+    use_lsd=False,
+    use_nms=False,
+    edge_color='midnightblue',
+    vertex_color='deeppink',
+    output_format='png',
+    seed=0,
+    randomize_seed=False
+):
+    """core processing function for image inference"""
+    # set random seed
+    seed = int(randomize_seed_fn(seed, randomize_seed))
+    fix_seeds(seed)
+    conf = {
+        'model_name': model_name,
+        'threshold': threshold,
+        'junction_threshold_hm': junction_threshold_hm,
+        'num_junctions_inference': num_junctions_inference,
+        'use_lsd': use_lsd,
+        'use_nms': use_nms,
+        'width': width,
+        'height': height,
+    }
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    pipeline_model = TwoViewPipeline(default_conf).to(device).eval()
+    pipeline_model.extractor.update_conf(conf)
+    saveto = f'temp_output/matching_results'
+    image1 = cv2.cvtColor(input_image1, cv2.COLOR_BGR2RGB)
+    cv2.imwrite(f'{saveto}/image.png', image1)
+    input_image1 = f'{saveto}/image.png'
+    if input_image2 is None:
+        image2 = trans_image_with_homograpy(image1)
+    else:
+        image2 = cv2.cvtColor(input_image2, cv2.COLOR_BGR2RGB)
+    cv2.imwrite(f'{saveto}/image2.png', image2)
+    input_image2 = f'{saveto}/image2.png'
+    gray0 = cv2.imread(input_image1, 0)
+    gray1 = cv2.imread(input_image2, 0)
+    torch_gray0, torch_gray1 = numpy_image_to_torch(gray0), numpy_image_to_torch(gray1)
+    torch_gray0, torch_gray1 = torch_gray0.to(device)[None], torch_gray1.to(device)[None]
+    x = {'image0': torch_gray0, 'image1': torch_gray1}
+    pred = pipeline_model(x)
+    pred = batch_to_np(pred)
+    kp0, kp1 = pred["keypoints0"], pred["keypoints1"]
+    m0 = pred["matches0"]
+    line_seg0, line_seg1 = pred["lines0"], pred["lines1"]
+    line_matches = pred["line_matches0"]
+    valid_matches = m0 != -1
+    match_indices = m0[valid_matches]
+    matched_kps0 = kp0[valid_matches]
+    matched_kps1 = kp1[match_indices]
+    valid_matches = line_matches != -1
+    match_indices = line_matches[valid_matches]
+    matched_lines0 = line_seg0[valid_matches]
+    matched_lines1 = line_seg1[match_indices]
+    img0, img1 = cv2.cvtColor(gray0, cv2.COLOR_GRAY2BGR), cv2.cvtColor(gray1, cv2.COLOR_GRAY2BGR)
+    mat_file = f'{saveto}/{save_name}_mat.png'
+    plot_images([img0, img1], dpi=200, pad=2.0)
+    plot_lines([line_seg0, line_seg1], ps=4, lw=2)
+    plt.gcf().canvas.manager.set_window_title('Detected Lines')
+    # plt.tight_layout()
+    plt.savefig(mat_file)
+    det_image = cv2.imread(mat_file)[:,:,::-1]
+    det_file = f'{saveto}/{save_name}_mat.png'
+    plot_images([img0, img1], dpi=200, pad=2.0)
+    plot_color_line_matches([matched_lines0, matched_lines1], lw=3)
+    plt.gcf().canvas.manager.set_window_title('Line Matches')
+    # plt.tight_layout()
+    plt.savefig(det_file)
+    mat_image = cv2.imread(det_file)[:,:,::-1]
+    show.Canvas.white_overlay = whitebg
+    painter = show.painters.HAWPainter()
+    fig_file = f'{saveto}/{save_name}_det1.png'
+    outputs = {'lines_pred': line_seg0.reshape(-1,4)}
+    with show.image_canvas(input_image1, fig_file=fig_file) as ax:
+        painter.draw_wireframe(ax,outputs, edge_color=edge_color, vertex_color=vertex_color)
+    det1_image = cv2.imread(fig_file)[:,:,::-1]
+    fig_file = f'{saveto}/{save_name}_det2.png'
+    outputs = {'lines_pred': line_seg1.reshape(-1,4)}
+    with show.image_canvas(input_image2, fig_file=fig_file) as ax:
+        painter.draw_wireframe(ax,outputs, edge_color=edge_color, vertex_color=vertex_color)
+    det2_image = cv2.imread(fig_file)[:,:,::-1]
+    return image2[:,:,::-1], mat_image, det_image, det1_image, det2_image, mat_file, det_file
+def demo():
+    """create the Gradio demo interface"""
+    css = """
+    #col-container {
+        margin: 0 auto;
+        max-width: 800px;
+    }
+    """
+    with gr.Blocks(css=css, title=_TITLE) as demo:
+        with gr.Column(elem_id="col-container"):
+            gr.Markdown(f'# {_TITLE}')
+            gr.Markdown("Detect wireframe structures in images using ScaleLSD model")
+            pid = gr.State()
+            figs_root = "assets/mat_figs"
+            example_single = [os.path.join(figs_root, 'single', iname) for iname in os.listdir(figs_root+'/single')]
+            example_pairs = [[img, None] for img in example_single]
+            example_pairs += [
+                [os.path.join(figs_root, 'pairs', f'ref_{i}.png'),
+                 os.path.join(figs_root, 'pairs', f'tgt_{i}.png')]
+                for i in [10, 72, 76, 95, 149, 151]
+            ]
+            with gr.Row():
+                input_image1 = gr.Image(example_pairs[0][0], label="Input Image1", type="numpy")
+                input_image2 = gr.Image(label="Input Image2", type="numpy")
+            with gr.Row():
+                mat_images = gr.Image(label="Matching Results")
+            with gr.Row():
+                det_images = gr.Image(label="Detection Results")
+            with gr.Row():
+                det_image1 = gr.Image(label="Detection1")
+                det_image2 = gr.Image(label="Detection2")
+            with gr.Row():
+                run_btn = gr.Button(value="Run", variant="primary")
+                stop_btn = gr.Button(value="Stop", variant="stop", visible=False)
+            with gr.Row():
+                mat_file = gr.File(label="Download Matching Result", type="filepath")
+                det_file = gr.File(label="Download Detection Result", type="filepath")
+            with gr.Accordion("Advanced Settings", open=True):
+                with gr.Row():
+                    model_name = gr.Dropdown(
+                        [ckpt for ckpt in os.listdir('models') if ckpt.endswith('.pt')],
+                        value='scalelsd-vitbase-v1-train-sa1b.pt',
+                        label="Model Selection"
+                    )
+                with gr.Row():
+                    save_name = gr.Textbox('temp_output', label="Save Name", placeholder="Name for saving output files")
+                with gr.Row():
+                    with gr.Column():
+                        threshold = gr.Number(10, label="Line Threshold")
+                        junction_threshold_hm = gr.Number(0.008, label="Junction Threshold")
+                        num_junctions_inference = gr.Number(1024, label="Max Number of Junctions")
+                        width = gr.Number(512, label="Input Width")
+                        height = gr.Number(512, label="Input Height")
+                    with gr.Column():
+                        draw_junctions_only = gr.Checkbox(False, label="Show Junctions Only")
+                        use_lsd = gr.Checkbox(False, label="Use LSD-Rectifier")
+                        use_nms = gr.Checkbox(True, label="Use NMS")
+                        output_format = gr.Dropdown(
+                            ['png', 'jpg', 'pdf'],
+                            value='png',
+                            label="Output Format"
+                        )
+                        whitebg = gr.Slider(0.0, 1.0, value=1.0, label="White Background Opacity")
+                        line_width = gr.Number(2, label="Line Width")
+                        juncs_size = gr.Number(8, label="Junctions Size")
+                with gr.Row():
+                    edge_color = gr.Dropdown(
+                        ['orange', 'midnightblue', 'red', 'green'],
+                        value='midnightblue',
+                        label="Edge Color"
+                    )
+                    vertex_color = gr.Dropdown(
+                        ['Cyan', 'deeppink', 'yellow', 'purple'],
+                        value='deeppink',
+                        label="Vertex Color"
+                    )
+                with gr.Row():
+                    randomize_seed = gr.Checkbox(False, label="Randomize Seed")
+                    seed = gr.Slider(0, MAX_SEED, value=42, step=1, label="Seed")
+            gr.Examples(
+                examples=example_pairs,
+                inputs=[input_image1, input_image2]
+            )
+            # star event handlers
+            run_event = run_btn.click(
+                fn=process_image,
+                inputs=[
+                    input_image1,
+                    input_image2,
+                    model_name,
+                    save_name,
+                    threshold,
+                    junction_threshold_hm,
+                    num_junctions_inference,
+                    width,
+                    height,
+                    line_width,
+                    juncs_size,
+                    whitebg,
+                    draw_junctions_only,
+                    use_lsd,
+                    use_nms,
+                    edge_color,
+                    vertex_color,
+                    output_format,
+                    seed,
+                    randomize_seed
+                ],
+                outputs=[input_image2, mat_images, det_images, det_image1, det_image2, mat_file, det_file],
+            )
+            # stop event handlers
+            stop_btn.click(
+                fn=stop_run,
+                outputs=[run_btn, stop_btn],
+                cancels=[run_event],
+                queue=False,
+            )
+            # When image1 changes, image2 is cleared
+            input_image1.change(
+                fn=clear_image2,
+                outputs=input_image2
+            )
+    return demo
+if __name__ == "__main__":
+    # 启动应用
+    demo = demo()
+    demo.launch()

line_matching/run.py ADDED Viewed

	@@ -0,0 +1,191 @@

+import argparse
+import os
+from os.path import join
+import sys
+import numpy as np
+import cv2
+import torch
+from matplotlib import pyplot as plt
+from tqdm import tqdm
+from gluestick import batch_to_np, numpy_image_to_torch, GLUESTICK_ROOT
+from gluestick.drawing import plot_images, plot_lines, plot_color_line_matches, plot_keypoints, plot_matches
+from line_matching.two_view_pipeline import TwoViewPipeline
+from scalelsd.base import show, WireframeGraph
+from scalelsd.ssl.datasets.transforms.homographic_transforms import sample_homography
+from kornia.geometry import warp_perspective,transform_points
+class HADConfig:
+    num_iter = 1
+    valid_border_margin = 3
+    translation = True
+    rotation = True
+    scale = True
+    perspective = True
+    scaling_amplitude = 0.2
+    perspective_amplitude_x = 0.2
+    perspective_amplitude_y = 0.2
+    allow_artifacts = False
+    patch_ratio = 0.85
+had_cfg = HADConfig()
+def sample_homographics(height, width):
+    def scale_homography(H, stride):
+        H_scaled = H.clone()
+        H_scaled[:, :, 2, :2] *= stride
+        H_scaled[:, :, :2, 2] /= stride
+        return H_scaled
+    homographic = sample_homography(
+        shape = (height, width),
+        perspective = had_cfg.perspective,
+        scaling = had_cfg.scale,
+        rotation = had_cfg.rotation,
+        translation = had_cfg.translation,
+        scaling_amplitude = had_cfg.scaling_amplitude,
+        perspective_amplitude_x = had_cfg.perspective_amplitude_x,
+        perspective_amplitude_y = had_cfg.perspective_amplitude_y,
+        patch_ratio = had_cfg.patch_ratio,
+        allow_artifacts = False
+        )[0]
+    homographic = torch.from_numpy(homographic[None]).float().cuda()
+    homographic_inv = torch.inverse(homographic)
+    H = {
+        'h.1': homographic,
+        'ih.1': homographic_inv,
+    }
+    return H
+def trans_image_with_homograpy(image):
+    h, w = image.shape[:2]
+    H = sample_homographics(height=h, width=w)
+    image_warped = warp_perspective(torch.Tensor(image).permute(2,0,1)[None].cuda(), H['h.1'], (h,w))
+    image_warped_ = image_warped[0].permute(1,2,0).cpu().numpy().astype(np.uint8)
+    plt.imshow(image_warped_)
+    plt.show()
+    return image_warped_
+def main():
+    # Parse input parameters
+    parser = argparse.ArgumentParser(
+        prog='GlueStick Demo',
+        description='Demo app to show the point and line matches obtained by GlueStick')
+    parser.add_argument('-img1', default='assets/figs/sa_1119229.jpg')
+    parser.add_argument('-img2', default=None)
+    parser.add_argument('--max_pts', type=int, default=1000)
+    parser.add_argument('--max_lines', type=int, default=300)
+    parser.add_argument('--model', type=str, default='models/paper-sa1b-997pkgs-model.pt')
+    args = parser.parse_args()
+    # important
+    if args.img1 is None and args.img2 is None:
+        raise ValueError("Input at least one path of image1 or image2")
+    # Evaluation config
+    conf = {
+        'name': 'two_view_pipeline',
+        'use_lines': True,
+        'extractor': {
+            'name': 'wireframe',
+            'sp_params': {
+                'force_num_keypoints': False,
+                'max_num_keypoints': args.max_pts,
+            },
+            'wireframe_params': {
+                'merge_points': True,
+                'merge_line_endpoints': True,
+                # 'merge_line_endpoints': False,
+            },
+            'max_n_lines': args.max_lines,
+        },
+        'matcher': {
+            'name': 'gluestick',
+            'weights': str(GLUESTICK_ROOT / 'resources' / 'weights' / 'checkpoint_GlueStick_MD.tar'),
+            'trainable': False,
+        },
+        'ground_truth': {
+            'from_pose_depth': False,
+        }
+    }
+    device = 'cuda' if torch.cuda.is_available() else 'cpu'
+    pipeline_model = TwoViewPipeline(conf).to(device).eval()
+    pipeline_model.extractor.update_conf(None)
+    saveto = f'temp_output/matching_results'
+    os.makedirs(saveto, exist_ok=True)
+    image1 = cv2.cvtColor(cv2.imread(args.img1), cv2.COLOR_BGR2RGB)
+    if args.img2 is None:
+        image2 = trans_image_with_homograpy(image1)
+        cv2.imwrite(f'{saveto}/warped_image.png', image2)
+        args.img2 = f'{saveto}/warped_image.png'
+    gray0 = cv2.imread(args.img1, 0)
+    gray1 = cv2.imread(args.img2, 0)
+    torch_gray0, torch_gray1 = numpy_image_to_torch(gray0), numpy_image_to_torch(gray1)
+    torch_gray0, torch_gray1 = torch_gray0.to(device)[None], torch_gray1.to(device)[None]
+    x = {'image0': torch_gray0, 'image1': torch_gray1}
+    pred = pipeline_model(x)
+    pred = batch_to_np(pred)
+    kp0, kp1 = pred["keypoints0"], pred["keypoints1"]
+    m0 = pred["matches0"]
+    line_seg0, line_seg1 = pred["lines0"], pred["lines1"]
+    line_matches = pred["line_matches0"]
+    valid_matches = m0 != -1
+    match_indices = m0[valid_matches]
+    matched_kps0 = kp0[valid_matches]
+    matched_kps1 = kp1[match_indices]
+    valid_matches = line_matches != -1
+    match_indices = line_matches[valid_matches]
+    matched_lines0 = line_seg0[valid_matches]
+    matched_lines1 = line_seg1[match_indices]
+    # Plot the matches
+    gray0 = cv2.imread(args.img1, 0)
+    gray1 = cv2.imread(args.img2, 0)
+    img0, img1 = cv2.cvtColor(gray0, cv2.COLOR_GRAY2BGR), cv2.cvtColor(gray1, cv2.COLOR_GRAY2BGR)
+    plot_images([img0, img1], dpi=200, pad=2.0)
+    plot_lines([line_seg0, line_seg1], ps=4, lw=2)
+    plt.gcf().canvas.manager.set_window_title('Detected Lines')
+    # plt.tight_layout()
+    plt.savefig(f'{saveto}/det.png')
+    plot_images([img0, img1], dpi=200, pad=2.0)
+    plot_color_line_matches([matched_lines0, matched_lines1], lw=3)
+    plt.gcf().canvas.manager.set_window_title('Line Matches')
+    # plt.tight_layout()
+    plt.savefig(f'{saveto}/mat.png')
+    whitebg = 1
+    show.Canvas.white_overlay = whitebg
+    painter = show.painters.HAWPainter()
+    fig_file = f'{saveto}/det1.png'
+    outputs = {'lines_pred': line_seg0.reshape(-1,4)}
+    with show.image_canvas(args.img1, fig_file=fig_file) as ax:
+        # painter.draw_wireframe(ax,outputs, edge_color='orange', vertex_color='Cyan')
+        painter.draw_wireframe(ax,outputs, edge_color='midnightblue', vertex_color='deeppink')
+    fig_file = f'{saveto}/det2.png'
+    outputs = {'lines_pred': line_seg1.reshape(-1,4)}
+    with show.image_canvas(args.img2, fig_file=fig_file) as ax:
+        painter.draw_wireframe(ax,outputs, edge_color='midnightblue', vertex_color='deeppink')
+if __name__ == '__main__':
+    main()

line_matching/run_list.py ADDED Viewed

	@@ -0,0 +1,144 @@

+import argparse
+import os
+from os.path import join
+import sys
+import cv2
+import torch
+from matplotlib import pyplot as plt
+from tqdm import tqdm
+from gluestick import batch_to_np, numpy_image_to_torch, GLUESTICK_ROOT
+from gluestick.drawing import plot_images, plot_lines, plot_color_line_matches, plot_keypoints, plot_matches
+# from gluestick.models.two_view_pipeline import TwoViewPipeline
+from line_matching.two_view_pipeline import TwoViewPipeline
+from scalelsd.base import show, WireframeGraph
+def main():
+    # Parse input parameters
+    parser = argparse.ArgumentParser(
+        prog='GlueStick Demo',
+        description='Demo app to show the point and line matches obtained by GlueStick')
+    parser.add_argument('-inum', default=None, type=int)
+    parser.add_argument('-imax', default=None, type=int)
+    parser.add_argument('-img1', default=join('resources' + os.path.sep + 'img1.jpg'))
+    parser.add_argument('-img2', default=join('resources' + os.path.sep + 'img2.jpg'))
+    parser.add_argument('--max_pts', type=int, default=1000)
+    parser.add_argument('--max_lines', type=int, default=300)
+    parser.add_argument('--model', default='scalelsd', type=str)
+    parser.add_argument('--test_root', type=str, default='data-ssl/0images-pre/')
+    args = parser.parse_args()
+    # Evaluation config
+    conf = {
+        'name': 'two_view_pipeline',
+        'use_lines': True,
+        'extractor': {
+            'name': 'wireframe',
+            'sp_params': {
+                'force_num_keypoints': False,
+                'max_num_keypoints': args.max_pts,
+            },
+            'wireframe_params': {
+                'merge_points': True,
+                'merge_line_endpoints': True,
+                # 'merge_line_endpoints': False,
+            },
+            'max_n_lines': args.max_lines,
+        },
+        'matcher': {
+            'name': 'gluestick',
+            'weights': str(GLUESTICK_ROOT / 'resources' / 'weights' / 'checkpoint_GlueStick_MD.tar'),
+            'trainable': False,
+        },
+        'ground_truth': {
+            'from_pose_depth': False,
+        }
+    }
+    device = 'cuda' if torch.cuda.is_available() else 'cpu'
+    pipeline_model = TwoViewPipeline(conf).to(device).eval()
+    pipeline_model.extractor.update_conf(None)
+    md = args.model
+    root = args.test_root
+    if args.inum is not None:
+        ids = [args.inum]
+    elif args.imax is not None:
+        ids = range(args.inum, args.imax+1)
+    else:
+        l_imgs = int(len(os.listdir(root))/2)
+        ids = range(l_imgs)
+    for id in tqdm(ids):
+        saveto = f'temp_output/matching_results/{md}/{id}'
+        os.makedirs(saveto, exist_ok=True)
+        args.img1 = root + f'ref_{str(id)}.png'
+        args.img2 = root + f'tgt_{str(id)}.png'
+        gray0 = cv2.imread(args.img1, 0)
+        gray1 = cv2.imread(args.img2, 0)
+        torch_gray0, torch_gray1 = numpy_image_to_torch(gray0), numpy_image_to_torch(gray1)
+        torch_gray0, torch_gray1 = torch_gray0.to(device)[None], torch_gray1.to(device)[None]
+        x = {'image0': torch_gray0, 'image1': torch_gray1}
+        pred = pipeline_model(x)
+        pred = batch_to_np(pred)
+        kp0, kp1 = pred["keypoints0"], pred["keypoints1"]
+        m0 = pred["matches0"]
+        line_seg0, line_seg1 = pred["lines0"], pred["lines1"]
+        line_matches = pred["line_matches0"]
+        valid_matches = m0 != -1
+        match_indices = m0[valid_matches]
+        matched_kps0 = kp0[valid_matches]
+        matched_kps1 = kp1[match_indices]
+        valid_matches = line_matches != -1
+        match_indices = line_matches[valid_matches]
+        matched_lines0 = line_seg0[valid_matches]
+        matched_lines1 = line_seg1[match_indices]
+        # Plot the matches
+        gray0 = cv2.imread(args.img1, 0)
+        gray1 = cv2.imread(args.img2, 0)
+        img0, img1 = cv2.cvtColor(gray0, cv2.COLOR_GRAY2BGR), cv2.cvtColor(gray1, cv2.COLOR_GRAY2BGR)
+        plot_images([img0, img1], dpi=200, pad=2.0)
+        plot_lines([line_seg0, line_seg1], ps=4, lw=2)
+        plt.gcf().canvas.manager.set_window_title('Detected Lines')
+        # plt.tight_layout()
+        plt.savefig(f'{saveto}/{md}_det_{id}.png')
+        plot_images([img0, img1], dpi=200, pad=2.0)
+        plot_color_line_matches([matched_lines0, matched_lines1], lw=3)
+        plt.gcf().canvas.manager.set_window_title('Line Matches')
+        # plt.tight_layout()
+        plt.savefig(f'{saveto}/{md}_mat_{id}.png')
+        whitebg = 1
+        show.Canvas.white_overlay = whitebg
+        painter = show.painters.HAWPainter()
+        fig_file = f'{saveto}/{md}_det1.png'
+        outputs = {'lines_pred': line_seg0.reshape(-1,4)}
+        with show.image_canvas(args.img1, fig_file=fig_file) as ax:
+            # painter.draw_wireframe(ax,outputs, edge_color='orange', vertex_color='Cyan')
+            painter.draw_wireframe(ax,outputs, edge_color='midnightblue', vertex_color='deeppink')
+        fig_file = f'{saveto}/{md}_det2.png'
+        outputs = {'lines_pred': line_seg1.reshape(-1,4)}
+        with show.image_canvas(args.img2, fig_file=fig_file) as ax:
+            # painter.draw_wireframe(ax,outputs, edge_color='orange', vertex_color='Cyan')
+            painter.draw_wireframe(ax,outputs, edge_color='midnightblue', vertex_color='deeppink')
+if __name__ == '__main__':
+    main()

line_matching/two_view_pipeline.py ADDED Viewed

	@@ -0,0 +1,167 @@

+"""
+A two-view sparse feature matching pipeline.
+This model contains sub-models for each step:
+    feature extraction, feature matching, outlier filtering, pose estimation.
+Each step is optional, and the features or matches can be provided as input.
+Default: SuperPoint with nearest neighbor matching.
+Convention for the matches: m0[i] is the index of the keypoint in image 1
+that corresponds to the keypoint i in image 0. m0[i] = -1 if i is unmatched.
+"""
+import numpy as np
+import torch
+from gluestick import get_model
+from gluestick.models.base_model import BaseModel
+from line_matching.wireframe import SPWireframeDescriptor
+def keep_quadrant_kp_subset(keypoints, scores, descs, h, w):
+    """Keep only keypoints in one of the four quadrant of the image."""
+    h2, w2 = h // 2, w // 2
+    w_x = np.random.choice([0, w2])
+    w_y = np.random.choice([0, h2])
+    valid_mask = ((keypoints[..., 0] >= w_x)
+                  & (keypoints[..., 0] < w_x + w2)
+                  & (keypoints[..., 1] >= w_y)
+                  & (keypoints[..., 1] < w_y + h2))
+    keypoints = keypoints[valid_mask][None]
+    scores = scores[valid_mask][None]
+    descs = descs.permute(0, 2, 1)[valid_mask].t()[None]
+    return keypoints, scores, descs
+def keep_random_kp_subset(keypoints, scores, descs, num_selected):
+    """Keep a random subset of keypoints."""
+    num_kp = keypoints.shape[1]
+    selected_kp = torch.randperm(num_kp)[:num_selected]
+    keypoints = keypoints[:, selected_kp]
+    scores = scores[:, selected_kp]
+    descs = descs[:, :, selected_kp]
+    return keypoints, scores, descs
+def keep_best_kp_subset(keypoints, scores, descs, num_selected):
+    """Keep the top num_selected best keypoints."""
+    sorted_indices = torch.sort(scores, dim=1)[1]
+    selected_kp = sorted_indices[:, -num_selected:]
+    keypoints = torch.gather(keypoints, 1,
+                             selected_kp[:, :, None].repeat(1, 1, 2))
+    scores = torch.gather(scores, 1, selected_kp)
+    descs = torch.gather(descs, 2,
+                         selected_kp[:, None].repeat(1, descs.shape[1], 1))
+    return keypoints, scores, descs
+class TwoViewPipeline(BaseModel):
+    default_conf = {
+        'extractor': {
+            'name': 'superpoint',
+            'trainable': False,
+        },
+        'use_lines': False,
+        'use_points': True,
+        'randomize_num_kp': False,
+        'detector': {'name': None},
+        'descriptor': {'name': None},
+        'matcher': {'name': 'nearest_neighbor_matcher'},
+        'filter': {'name': None},
+        'solver': {'name': None},
+        'ground_truth': {
+            'from_pose_depth': False,
+            'from_homography': False,
+            'th_positive': 3,
+            'th_negative': 5,
+            'reward_positive': 1,
+            'reward_negative': -0.25,
+            'is_likelihood_soft': True,
+            'p_random_occluders': 0,
+            'n_line_sampled_pts': 50,
+            'line_perp_dist_th': 5,
+            'overlap_th': 0.2,
+            'min_visibility_th': 0.5
+        },
+    }
+    required_data_keys = ['image0', 'image1']
+    strict_conf = False  # need to pass new confs to children models
+    components = [
+        'extractor', 'detector', 'descriptor', 'matcher', 'filter', 'solver']
+    def _init(self, conf):
+        if conf.extractor.name:
+            self.extractor = SPWireframeDescriptor(conf.extractor)
+        if conf.matcher.name:
+            self.matcher = get_model(conf.matcher.name)(conf.matcher)
+        else:
+            self.required_data_keys += ['matches0']
+        if conf.filter.name:
+            self.filter = get_model(conf.filter.name)(conf.filter)
+        if conf.solver.name:
+            self.solver = get_model(conf.solver.name)(conf.solver)
+    def _forward(self, data):
+        def process_siamese(data, i):
+            data_i = {k[:-1]: v for k, v in data.items() if k[-1] == i}
+            if self.conf.extractor.name:
+                pred_i = self.extractor(data_i)
+            else:
+                pred_i = {}
+                if self.conf.detector.name:
+                    pred_i = self.detector(data_i)
+                else:
+                    for k in ['keypoints', 'keypoint_scores', 'descriptors',
+                              'lines', 'line_scores', 'line_descriptors',
+                              'valid_lines']:
+                        if k in data_i:
+                            pred_i[k] = data_i[k]
+                if self.conf.descriptor.name:
+                    pred_i = {
+                        **pred_i, **self.descriptor({**data_i, **pred_i})}
+            return pred_i
+        pred0 = process_siamese(data, '0')
+        pred1 = process_siamese(data, '1')
+        pred = {**{k + '0': v for k, v in pred0.items()},
+                **{k + '1': v for k, v in pred1.items()}}
+        if self.conf.matcher.name:
+            pred = {**pred, **self.matcher({**data, **pred})}
+        if self.conf.filter.name:
+            pred = {**pred, **self.filter({**data, **pred})}
+        if self.conf.solver.name:
+            pred = {**pred, **self.solver({**data, **pred})}
+        return pred
+    def loss(self, pred, data):
+        losses = {}
+        total = 0
+        for k in self.components:
+            if self.conf[k].name:
+                try:
+                    losses_ = getattr(self, k).loss(pred, {**pred, **data})
+                except NotImplementedError:
+                    continue
+                losses = {**losses, **losses_}
+                total = losses_['total'] + total
+        return {**losses, 'total': total}
+    def metrics(self, pred, data):
+        metrics = {}
+        for k in self.components:
+            if self.conf[k].name:
+                try:
+                    metrics_ = getattr(self, k).metrics(pred, {**pred, **data})
+                except NotImplementedError:
+                    continue
+                metrics = {**metrics, **metrics_}
+        return metrics

line_matching/wireframe.py ADDED Viewed

	@@ -0,0 +1,341 @@

+import numpy as np
+import torch
+from pytlsd import lsd
+from sklearn.cluster import DBSCAN
+import sys
+from gluestick.models.base_model import BaseModel
+from gluestick.models.superpoint import SuperPoint, sample_descriptors
+from gluestick.geometry import warp_lines_torch
+from pathlib import Path
+import copy, cv2
+import os, glob
+import scalelsd
+from scalelsd.ssl.models.detector import ScaleLSD
+from scalelsd.ssl.misc.train_utils import fix_seeds, load_scalelsd_model
+def lines_to_wireframe(lines, line_scores, all_descs, conf):
+    """ Given a set of lines, their score and dense descriptors,
+        merge close-by endpoints and compute a wireframe defined by
+        its junctions and connectivity.
+    Returns:
+        junctions: list of [num_junc, 2] tensors listing all wireframe junctions
+        junc_scores: list of [num_junc] tensors with the junction score
+        junc_descs: list of [dim, num_junc] tensors with the junction descriptors
+        connectivity: list of [num_junc, num_junc] bool arrays with True when 2 junctions are connected
+        new_lines: the new set of [b_size, num_lines, 2, 2] lines
+        lines_junc_idx: a [b_size, num_lines, 2] tensor with the indices of the junctions of each endpoint
+        num_true_junctions: a list of the number of valid junctions for each image in the batch,
+                            i.e. before filling with random ones
+    """
+    b_size, _, _, _ = all_descs.shape
+    device = lines.device
+    endpoints = lines.reshape(b_size, -1, 2)
+    (junctions, junc_scores, junc_descs, connectivity, new_lines,
+     lines_junc_idx, num_true_junctions) = [], [], [], [], [], [], []
+    for bs in range(b_size):
+        # Cluster the junctions that are close-by
+        db = DBSCAN(eps=conf.nms_radius, min_samples=1).fit(
+            endpoints[bs].cpu().numpy())
+        clusters = db.labels_
+        n_clusters = len(set(clusters))
+        num_true_junctions.append(n_clusters)
+        # Compute the average junction and score for each cluster
+        clusters = torch.tensor(clusters, dtype=torch.long,
+                                device=device)
+        new_junc = torch.zeros(n_clusters, 2, dtype=torch.float,
+                               device=device)
+        new_junc.scatter_reduce_(0, clusters[:, None].repeat(1, 2),
+                                 endpoints[bs], reduce='mean',
+                                 include_self=False)
+        junctions.append(new_junc)
+        new_scores = torch.zeros(n_clusters, dtype=torch.float, device=device)
+        new_scores.scatter_reduce_(
+            0, clusters, torch.repeat_interleave(line_scores[bs], 2),
+            reduce='mean', include_self=False)
+        junc_scores.append(new_scores)
+        # Compute the new lines
+        new_lines.append(junctions[-1][clusters].reshape(-1, 2, 2))
+        lines_junc_idx.append(clusters.reshape(-1, 2))
+        # Compute the junction connectivity
+        junc_connect = torch.eye(n_clusters, dtype=torch.bool,
+                                 device=device)
+        pairs = clusters.reshape(-1, 2)  # these pairs are connected by a line
+        junc_connect[pairs[:, 0], pairs[:, 1]] = True
+        junc_connect[pairs[:, 1], pairs[:, 0]] = True
+        connectivity.append(junc_connect)
+        # Interpolate the new junction descriptors
+        junc_descs.append(sample_descriptors(
+            junctions[-1][None], all_descs[bs:(bs + 1)], 8)[0])
+    new_lines = torch.stack(new_lines, dim=0)
+    lines_junc_idx = torch.stack(lines_junc_idx, dim=0)
+    return (junctions, junc_scores, junc_descs, connectivity,
+            new_lines, lines_junc_idx, num_true_junctions)
+class SPWireframeDescriptor(BaseModel):
+    default_conf = {
+        'sp_params': {
+            'has_detector': True,
+            'has_descriptor': True,
+            'descriptor_dim': 256,
+            'trainable': False,
+            # Inference
+            'return_all': True,
+            'sparse_outputs': True,
+            'nms_radius': 4,
+            'detection_threshold': 0.005,
+            'max_num_keypoints': 1000,
+            'force_num_keypoints': True,
+            'remove_borders': 4,
+        },
+        'wireframe_params': {
+            'merge_points': True,
+            'merge_line_endpoints': True,
+            'nms_radius': 3,
+            'max_n_junctions': 500,
+        },
+        'max_n_lines': 250,
+        'min_length': 15,
+    }
+    required_data_keys = ['image']
+    def _init(self, conf):
+        self.conf = conf
+        self.sp = SuperPoint(conf.sp_params)
+        self.extr_conf = {}
+    def detect_lsd_lines(self, x, max_n_lines=None):
+        if max_n_lines is None:
+            max_n_lines = self.conf.max_n_lines
+        lines, scores, valid_lines = [], [], []
+        for b in range(len(x)):
+            # For each image on batch
+            img = (x[b].squeeze().cpu().numpy() * 255).astype(np.uint8)
+            if max_n_lines is None:
+                b_segs = lsd(img)
+            else:
+                for s in [0.3, 0.4, 0.5, 0.7, 0.8, 1.0]:
+                    b_segs = lsd(img, scale=s)
+                    if len(b_segs) >= max_n_lines:
+                        break
+            segs_length = np.linalg.norm(b_segs[:, 2:4] - b_segs[:, 0:2], axis=1)
+            # Remove short lines
+            b_segs = b_segs[segs_length >= self.conf.min_length]
+            segs_length = segs_length[segs_length >= self.conf.min_length]
+            b_scores = b_segs[:, -1] * np.sqrt(segs_length)
+            # Take the most relevant segments with
+            indices = np.argsort(-b_scores)
+            if max_n_lines is not None:
+                indices = indices[:max_n_lines]
+            lines.append(torch.from_numpy(b_segs[indices, :4].reshape(-1, 2, 2)))
+            scores.append(torch.from_numpy(b_scores[indices]))
+            valid_lines.append(torch.ones_like(scores[-1], dtype=torch.bool))
+        lines = torch.stack(lines).to(x)
+        scores = torch.stack(scores).to(x)
+        valid_lines = torch.stack(valid_lines).to(x.device)
+        return lines, scores, valid_lines
+    def update_conf(self, conf):
+        self.extr_conf = conf
+    def _forward(self, data):
+        b_size, _, h, w = data['image'].shape
+        device = data['image'].device
+        # device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        if not self.conf.sp_params.force_num_keypoints:
+            assert b_size == 1, "Only batch size of 1 accepted for non padded inputs"
+        # Line detection
+        if 'lines' not in data or 'line_scores' not in data:
+            if self.extr_conf is None:
+                ckpt = 'models/scalelsd-vitbase-v1-train-sa1b.pt'
+                model = load_scalelsd_model(ckpt, device)
+                model.junction_threshold_hm = 0.008
+                threshold = 5
+                model.num_junctions_inference = 4096
+                size = 512
+                image = data['image']
+                image_size = image.shape[-2:]
+                image_np = image[0,0].cpu().numpy()
+                image_cp = copy.deepcopy(image_np)
+                image_torch = torch.from_numpy(cv2.resize(image_cp, (size, size))).float()
+                image_cuda = image_torch[None,None].to(device)
+                meta = {
+                        'width': image_size[1],
+                        'height':image_size[0],
+                        'filename': '',
+                        'use_lsd': False,
+                        'use_nms': False,
+                    }
+                outputs, _ = model(image_cuda, meta)
+                lines = outputs[0]['lines_pred']
+                line_scores = outputs[0]['lines_score']
+                lines = lines[line_scores>=threshold]
+                line_scores = line_scores[line_scores>=threshold][None]
+            elif self.extr_conf['model_name'] != 'lsd':
+                # initialize model
+                ckpt = "models/" + self.extr_conf['model_name']
+                model = load_scalelsd_model(ckpt, device)
+                # set model parameters
+                model.junction_threshold_hm = self.extr_conf['junction_threshold_hm']
+                model.num_junctions_inference = self.extr_conf['num_junctions_inference']
+                width, height = self.extr_conf['width'], self.extr_conf['height']
+                image = data['image']
+                image_size = image.shape[-2:]
+                image_np = image[0,0].cpu().numpy()
+                image_cp = copy.deepcopy(image_np)
+                image_torch = torch.from_numpy(cv2.resize(image_cp, (width, height))).float()
+                image_cuda = image_torch[None,None].to(device)
+                meta = {
+                        'width': image_size[1],
+                        'height':image_size[0],
+                        'filename': '',
+                        'use_lsd': self.extr_conf['use_lsd'],
+                        'use_nms': self.extr_conf['use_nms'],
+                    }
+                outputs, _ = model(image_cuda, meta)
+                lines = outputs[0]['lines_pred']
+                line_scores = outputs[0]['lines_score']
+                lines = lines[line_scores>=self.extr_conf['threshold']]
+                line_scores = line_scores[line_scores>=self.extr_conf['threshold']][None]
+            else:
+                if 'original_img' in data:
+                    # Detect more lines, because when projecting them to the image most of them will be discarded
+                    lines, line_scores, valid_lines = self.detect_lsd_lines(
+                        data['original_img'], self.conf.max_n_lines * 3)
+                    # Apply the same transformation that is applied in homography_adaptation
+                    lines, valid_lines2 = warp_lines_torch(lines, data['H'], False, data['image'].shape[-2:])
+                    valid_lines = valid_lines & valid_lines2
+                    lines[~valid_lines] = -1
+                    line_scores[~valid_lines] = 0
+                    # Re-sort the line segments to pick the ones that are inside the image and have bigger score
+                    sorted_scores, sorting_indices = torch.sort(line_scores, dim=-1, descending=True)
+                    line_scores = sorted_scores[:, :self.conf.max_n_lines]
+                    sorting_indices = sorting_indices[:, :self.conf.max_n_lines]
+                    lines = torch.take_along_dim(lines, sorting_indices[..., None, None], 1)
+                    valid_lines = torch.take_along_dim(valid_lines, sorting_indices, 1)
+                else:
+                    lines, line_scores, valid_lines = self.detect_lsd_lines(data['image'],max_n_lines=1000000)
+        else:
+            lines, line_scores, valid_lines = data['lines'], data['line_scores'], data['valid_lines']
+        if line_scores.shape[-1] != 0:
+            line_scores /= (line_scores.new_tensor(1e-8) + line_scores.max(dim=1).values[:, None])
+        # SuperPoint prediction
+        pred = self.sp(data)
+        # Remove keypoints that are too close to line endpoints
+        if self.conf.wireframe_params.merge_points:
+            kp = pred['keypoints']
+            line_endpts = lines.reshape(b_size, -1, 2)
+            dist_pt_lines = torch.norm(
+                kp[:, :, None] - line_endpts[:, None], dim=-1)
+            # For each keypoint, mark it as valid or to remove
+            pts_to_remove = torch.any(
+                dist_pt_lines < self.conf.sp_params.nms_radius, dim=2)
+            # Simply remove them (we assume batch_size = 1 here)
+            assert len(kp) == 1
+            pred['keypoints'] = pred['keypoints'][0][~pts_to_remove[0]][None]
+            pred['keypoint_scores'] = pred['keypoint_scores'][0][~pts_to_remove[0]][None]
+            pred['descriptors'] = pred['descriptors'][0].T[~pts_to_remove[0]].T[None]
+        # Connect the lines together to form a wireframe
+        orig_lines = lines.clone()
+        if self.conf.wireframe_params.merge_line_endpoints and len(lines[0]) > 0:
+            # Merge first close-by endpoints to connect lines
+            (line_points, line_pts_scores, line_descs, line_association,
+             lines, lines_junc_idx, num_true_junctions) = lines_to_wireframe(
+                lines, line_scores, pred['all_descriptors'],
+                conf=self.conf.wireframe_params)
+            # Add the keypoints to the junctions and fill the rest with random keypoints
+            (all_points, all_scores, all_descs,
+             pl_associativity) = [], [], [], []
+            for bs in range(b_size):
+                all_points.append(torch.cat(
+                    [line_points[bs], pred['keypoints'][bs]], dim=0))
+                all_scores.append(torch.cat(
+                    [line_pts_scores[bs], pred['keypoint_scores'][bs]], dim=0))
+                all_descs.append(torch.cat(
+                    [line_descs[bs], pred['descriptors'][bs]], dim=1))
+                associativity = torch.eye(len(all_points[-1]), dtype=torch.bool, device=device)
+                associativity[:num_true_junctions[bs], :num_true_junctions[bs]] = \
+                    line_association[bs][:num_true_junctions[bs], :num_true_junctions[bs]]
+                pl_associativity.append(associativity)
+            all_points = torch.stack(all_points, dim=0)
+            all_scores = torch.stack(all_scores, dim=0)
+            all_descs = torch.stack(all_descs, dim=0)
+            pl_associativity = torch.stack(pl_associativity, dim=0)
+        else:
+            # Lines are independent
+            all_points = torch.cat([lines.reshape(b_size, -1, 2),
+                                    pred['keypoints']], dim=1)
+            n_pts = all_points.shape[1]
+            num_lines = lines.shape[1]
+            num_true_junctions = [num_lines * 2] * b_size
+            all_scores = torch.cat([
+                torch.repeat_interleave(line_scores, 2, dim=1),
+                pred['keypoint_scores']], dim=1)
+            pred['line_descriptors'] = self.endpoints_pooling(
+                lines, pred['all_descriptors'], (h, w))
+            all_descs = torch.cat([
+                pred['line_descriptors'].reshape(b_size, self.conf.sp_params.descriptor_dim, -1),
+                pred['descriptors']], dim=2)
+            pl_associativity = torch.eye(
+                n_pts, dtype=torch.bool,
+                device=device)[None].repeat(b_size, 1, 1)
+            lines_junc_idx = torch.arange(
+                num_lines * 2, device=device).reshape(1, -1, 2).repeat(b_size, 1, 1)
+        del pred['all_descriptors']  # Remove dense descriptors to save memory
+        torch.cuda.empty_cache()
+        return {'keypoints': all_points,
+                'keypoint_scores': all_scores,
+                'descriptors': all_descs,
+                'pl_associativity': pl_associativity,
+                'num_junctions': torch.tensor(num_true_junctions),
+                'lines': lines,
+                'orig_lines': orig_lines,
+                'lines_junc_idx': lines_junc_idx,
+                'line_scores': line_scores,
+                # 'valid_lines': valid_lines,
+                }
+    @staticmethod
+    def endpoints_pooling(segs, all_descriptors, img_shape):
+        assert segs.ndim == 4 and segs.shape[-2:] == (2, 2)
+        filter_shape = all_descriptors.shape[-2:]
+        scale_x = filter_shape[1] / img_shape[1]
+        scale_y = filter_shape[0] / img_shape[0]
+        scaled_segs = torch.round(segs * torch.tensor([scale_x, scale_y]).to(segs)).long()
+        scaled_segs[..., 0] = torch.clip(scaled_segs[..., 0], 0, filter_shape[1] - 1)
+        scaled_segs[..., 1] = torch.clip(scaled_segs[..., 1], 0, filter_shape[0] - 1)
+        line_descriptors = [all_descriptors[None, b, ..., torch.squeeze(b_segs[..., 1]), torch.squeeze(b_segs[..., 0])]
+                            for b, b_segs in enumerate(scaled_segs)]
+        line_descriptors = torch.cat(line_descriptors)
+        return line_descriptors  # Shape (1, 256, 308, 2)
+    def loss(self, pred, data):
+        raise NotImplementedError
+    def metrics(self, pred, data):
+        return {}

predictor/predict.py ADDED Viewed

	@@ -0,0 +1,131 @@

+import torch
+import random
+import numpy as np
+import os
+import os.path as osp
+import glob
+from tqdm import tqdm
+from scalelsd.base import setup_logger, MetricLogger, show, WireframeGraph
+from scalelsd.ssl.datasets import dataset_util
+from scalelsd.ssl.models.detector import ScaleLSD
+from scalelsd.ssl.misc.train_utils import load_scalelsd_model
+from torch.utils.data import DataLoader
+import torch.utils.data.dataloader as torch_loader
+from pathlib import Path
+import argparse, yaml, logging, time, datetime, cv2, copy, sys, json
+from easydict import EasyDict
+import accelerate
+from accelerate import load_checkpoint_and_dispatch
+import matplotlib
+import matplotlib.pyplot as plt
+def parse_args():
+    aparser = argparse.ArgumentParser()
+    aparser.add_argument('-c', '--ckpt', default='models/scalelsd-vitbase-v1-train-sa1b.pt', type=str, help='the path for loading checkpoints')
+    aparser.add_argument('-t','--threshold', default=10,type=float)
+    aparser.add_argument('-i', '--img', required=True, type=str)
+    aparser.add_argument('--width', default=512, type=int)
+    aparser.add_argument('--height', default=512,type=int)
+    aparser.add_argument('--whitebg', default=0.0, type=float)
+    aparser.add_argument('--saveto', default=None, type=str,)
+    aparser.add_argument('-e','--ext', default='pdf', type=str, choices=['pdf','png','json','txt'])
+    aparser.add_argument('--device', default='cuda', type=str, choices=['cuda','cpu','mps'])
+    aparser.add_argument('--disable-show', default=False, action='store_true')
+    aparser.add_argument('--draw-junctions-only', default=False, action='store_true')
+    aparser.add_argument('--use_lsd', default=False, action='store_true')
+    aparser.add_argument('--use_nms', default=False, action='store_true')
+    ScaleLSD.cli(aparser)
+    args = aparser.parse_args()
+    ScaleLSD.configure(args)
+    return args
+def main():
+    args = parse_args()
+    model = load_scalelsd_model(args.ckpt, device=args.device)
+    # Set up output directory and painter
+    if args.saveto is None:
+        print('No output directory specified, saving outputs to folder: temp_output/ScaleLSD')
+        args.saveto = 'temp_output/ScaleLSD'
+    os.makedirs(args.saveto,exist_ok=True)
+    show.painters.HAWPainter.confidence_threshold = args.threshold
+    # show.painters.HAWPainter.line_width = 2
+    # show.painters.HAWPainter.marker_size = 4
+    show.Canvas.show = not args.disable_show
+    if args.whitebg > 0.0:
+        show.Canvas.white_overlay = args.whitebg
+    painter = show.painters.HAWPainter()
+    edge_color = 'orange' # 'midnightblue'
+    vertex_color = 'Cyan' # 'deeppink'
+    # Prepare images
+    all_images = []
+    if os.path.isfile(args.img) and args.img.endswith(('.jpg', '.png')):
+        all_images.append(args.img)
+    elif os.path.isdir(args.img):
+        for file in os.listdir(args.img):
+            if file.endswith(('.jpg', '.png')):
+                fname = os.path.join(args.img, file)
+                all_images.append(fname)
+        all_images = sorted(all_images)
+    else:
+        raise ValueError('Input must be a file or a directory containing images.')
+    # Inference
+    for fname in tqdm(all_images):
+        pname = Path(fname)
+        image = cv2.imread(fname,0)
+        # for resize input, default shape is [512, 512]
+        ori_shape = image.shape[:2]
+        image_cp = copy.deepcopy(image)
+        image_ = cv2.resize(image_cp, (args.width, args.height))
+        image_ = torch.from_numpy(image_).float()/255.0
+        image_ = image_[None,None].to(args.device)
+        meta = {
+            'width': ori_shape[1],
+            'height':ori_shape[0],
+            'filename': '',
+            'use_lsd': args.use_lsd,
+            'use_nms': args.use_nms,
+        }
+        with torch.no_grad():
+            outputs, _ = model(image_, meta)
+            outputs = outputs[0]
+        if args.saveto is not None:
+            if args.ext in ['png', 'pdf']:
+                fig_file = osp.join(args.saveto, pname.with_suffix('.'+args.ext).name)
+                with show.image_canvas(fname, fig_file=fig_file) as ax:
+                    if args.draw_junctions_only:
+                        painter.draw_junctions(ax,outputs)
+                    else:
+                        # painter.draw_wireframe(ax,outputs)
+                        painter.draw_wireframe(ax,outputs, edge_color=edge_color, vertex_color=vertex_color)
+            elif args.ext == 'json':
+                indices = WireframeGraph.xyxy2indices(outputs['juncs_pred'],outputs['lines_pred'])
+                wireframe = WireframeGraph(outputs['juncs_pred'], outputs['juncs_score'], indices, outputs['lines_score'], outputs['width'], outputs['height'])
+                outpath = osp.join(args.saveto, pname.with_suffix('.json').name)
+                with open(outpath,'w') as f:
+                    json.dump(wireframe.jsonize(),f)
+            else:
+                raise ValueError('Unsupported extension: {} is not in [png, pdf, json]'.format(args.ext))
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+opencv-python
+cython
+matplotlib
+yacs
+scikit-image
+tqdm
+python-json-logger
+h5py
+shapely
+pycolmap
+seaborn
+kornia
+easydict
+pynvml
+timm
+einops==0.7.0
+numpy==1.26.4
+gradio
+pydantic==2.10.6
+pytlsd@git+https://github.com/iago-suarez/pytlsd.git@4180ab8

scalelsd/.gitignore ADDED Viewed

	@@ -0,0 +1,10 @@

+__pycache__/
+*/__pycache__/
+**/__pycache__/
+data-ssl
+exp
+exp-ssl
+temp_output
+third_party
+./models

scalelsd/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ from . import base
2	+ from . import ssl

scalelsd/base/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from .csrc import _C
+from . import utils
+from .utils.logger import setup_logger
+from .utils.metric_logger import MetricLogger
+from .wireframe import WireframeGraph
+__all__ = [
+    "_C",
+    "utils",
+    "setup_logger",
+    "MetricLogger",
+    "WireframeGraph",
+]

scalelsd/base/csrc/__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from torch.utils.cpp_extension import load
+import glob
+import os.path as osp
+__this__ = osp.dirname(__file__)
+try:
+    _C = load(name='_C',sources=[
+        osp.join(__this__,'binding.cpp'),
+        osp.join(__this__,'linesegment.cu'),
+    ]
+    )
+except:
+    _C = None
+_C = load(name='_C', sources=[osp.join(__this__,'binding.cpp'), osp.join(__this__,'linesegment.cu')])
+__all__ = ["_C"]
+#_C = load(name='base._C', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu'])

scalelsd/base/csrc/binding.cpp ADDED Viewed

	@@ -0,0 +1,5 @@

+#include "linesegment.h"
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+  m.def("encodels", &encodels, "Encoding line segments to maps");
+}

scalelsd/base/csrc/linesegment.cu ADDED Viewed

	@@ -0,0 +1,139 @@

+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+// #include <THC/THC.h>
+// #include <THC/THCDeviceUtils.cuh>
+#include <torch/torch.h>
+#include <torch/extension.h>
+#include <vector>
+#include <iostream>
+int const CUDA_NUM_THREADS = 1024;
+inline int CUDA_GET_BLOCKS(const int N) {
+    return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
+}
+#define CUDA_1D_KERNEL_LOOP(i, n)                            \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
+       i += blockDim.x * gridDim.x)
+__global__ void encode_kernel(const int nthreads, const float* lines,
+                           const int input_height, const int input_width, const int num,
+                           const int height, const int width, float* map,
+                           bool* label, float* tmap)
+{
+    CUDA_1D_KERNEL_LOOP(index, nthreads){
+        int w = index % width;
+        int h = (index / width) % height;
+        int x_index = h*width + w;
+        int y_index = height*width + h*width + w;
+        int ux_index = 2*height*width + h*width + w;
+        int uy_index = 3*height*width + h*width + w;
+        int vx_index = 4*height*width + h*width + w;
+        int vy_index = 5*height*width + h*width + w;
+        int label_index = h*width + w;
+        float px = (float) w;
+        float py = (float) h;
+        float min_dis = 1e30;
+        int minp = -1;
+        bool flagp = true;
+        for(int i = 0; i < num; ++i) {
+            float xs = (float)width  /(float)input_width;
+            float ys = (float)height /(float)input_height;
+            float x1 = lines[4*i  ]*xs;
+            float y1 = lines[4*i+1]*ys;
+            float x2 = lines[4*i+2]*xs;
+            float y2 = lines[4*i+3]*ys;
+            float dx = x2 - x1;
+            float dy = y2 - y1;
+            float ux = x1 - px;
+            float uy = y1 - py;
+            float vx = x2 - px;
+            float vy = y2 - py;
+            float norm2 = dx*dx + dy*dy;
+            bool flag = false;
+            float t = ((px-x1)*dx + (py-y1)*dy)/(norm2+1e-6);
+            if (t<=1 && t>=0.0)
+                flag = true;
+            t = t<0.0? 0.0:t;
+            t = t>1.0? 1.0:t;
+            float ax = x1   + t*(x2-x1) - px;
+            float ay = y1   + t*(y2-y1) - py;
+            float dis = ax*ax + ay*ay;
+            if (dis < min_dis) {
+                min_dis = dis;
+                map[x_index] = ax;
+                map[y_index] = ay;
+                float norm_u2 = ux*ux+uy*uy;
+                float norm_v2 = vx*vx+vy*vy;
+                if (norm_u2 < norm_v2){
+                    map[ux_index] = ux;
+                    map[uy_index] = uy;
+                    map[vx_index] = vx;
+                    map[vy_index] = vy;
+                }
+                else{
+                    map[ux_index] = vx;
+                    map[uy_index] = vy;
+                    map[vx_index] = ux;
+                    map[vy_index] = uy;
+                }
+                minp = i;
+                if (flag)
+                    flagp = true;
+                else
+                    flagp = false;
+                tmap[index] = t;
+            }
+        }
+        // label[label_index+minp*height*width] = flagp;
+    }
+}
+std::tuple<at::Tensor, at::Tensor, at::Tensor> lsencode_cuda(
+    const at::Tensor& lines,
+    const int input_height,
+    const int input_width,
+    const int height,
+    const int width,
+    const int num_lines)
+{
+    auto map = at::zeros({6,height,width}, lines.options());
+    auto tmap = at::zeros({1,height,width}, lines.options());
+    auto label = at::zeros({1,height,width}, lines.options().dtype(at::kBool));
+    auto nthreads = height*width;
+    cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+    float* map_data = map.data<float>();
+    float* tmap_data = tmap.data<float>();
+    bool*  label_data = label.data<bool>();
+    encode_kernel<<<CUDA_GET_BLOCKS(nthreads), CUDA_NUM_THREADS >>>(
+            nthreads,
+            lines.contiguous().data<float>(),
+            input_height, input_width,
+            num_lines,
+            height, width,
+            map_data,
+            label_data,
+            tmap_data);
+    // THCudaCheck(cudaGetLastError());
+    return std::make_tuple(map, label, tmap);
+}

scalelsd/base/csrc/linesegment.h ADDED Viewed

	@@ -0,0 +1,26 @@

+// #pragma once
+#include <torch/extension.h>
+std::tuple<at::Tensor, at::Tensor, at::Tensor> lsencode_cuda(
+    const at::Tensor& lines,
+    const int input_height,
+    const int input_width,
+    const int height,
+    const int width,
+    const int num_lines);
+std::tuple<at::Tensor,at::Tensor,at::Tensor> encodels(
+    const at::Tensor& lines,
+    const int input_height,
+    const int input_width,
+    const int height,
+    const int width,
+    const int num_lines)
+{
+    return lsencode_cuda(lines,
+                    input_height,
+                    input_width,
+                    height,
+                    width,
+                    num_lines);
+}

scalelsd/base/show/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@

+from .canvas import Canvas, image_canvas, canvas
+from .painters import HAWPainter
+from .cli import cli, configure

scalelsd/base/show/canvas.py ADDED Viewed

	@@ -0,0 +1,153 @@

+from contextlib import contextmanager
+import logging
+import os
+from matplotlib.pyplot import figimage, margins
+import numpy as np
+import cv2
+try:
+    import matplotlib.pyplot as plt  # pylint: disable=import-error
+except ModuleNotFoundError as err:
+    if err.name != 'matplotlib':
+        raise err
+    plt = None
+LOG = logging.getLogger(__name__)
+class Canvas:
+    """Canvas for plotting.
+    All methods expose Axes objects. To get Figure objects, you can ask the axis
+    `ax.get_figure()`.
+    """
+    all_images_directory = None
+    all_images_count = 0
+    show = False
+    image_width = 7.0
+    image_height = None
+    blank_dpi = 200
+    image_dpi_factor = 1.0
+    image_min_dpi = 50.0
+    out_file_extension = 'pdf'
+    white_overlay = False
+    @classmethod
+    def generic_name(cls):
+        if cls.all_images_directory is None:
+            return None
+        os.makedirs(cls.all_images_directory, exist_ok=True)
+        cls.all_images_count += 1
+        return os.path.join(cls.all_images_directory,
+                            '{:04}.{}'.format(cls.all_images_count, cls.out_file_extension))
+    @classmethod
+    @contextmanager
+    def blank(cls, fig_file=None, *, dpi=None, nomargin=False, **kwargs):
+        if plt is None:
+            raise Exception('please install matplotlib')
+        if fig_file is None:
+            fig_file = cls.generic_name()
+        if dpi is None:
+            dpi = cls.blank_dpi
+        if 'figsize' not in kwargs:
+            kwargs['figsize'] = (10, 6)
+        if nomargin:
+            if 'gridspec_kw' not in kwargs:
+                kwargs['gridspec_kw'] = {}
+            kwargs['gridspec_kw']['wspace'] = 0
+            kwargs['gridspec_kw']['hspace'] = 0
+            kwargs['gridspec_kw']['left'] = 0.0
+            kwargs['gridspec_kw']['right'] = 1.0
+            kwargs['gridspec_kw']['top'] = 1.0
+            kwargs['gridspec_kw']['bottom'] = 0.0
+        fig, ax = plt.subplots(dpi=dpi, **kwargs)
+        yield ax
+        fig.set_tight_layout(not margins)
+        if fig_file:
+            LOG.debug('writing image to %s', fig_file)
+            fig.savefig(fig_file)
+        if cls.show:
+            plt.show()
+        plt.close(fig)
+    @classmethod
+    @contextmanager
+    def image(cls, image, fig_file=None, *, margin=None, **kwargs):
+        if plt is None:
+            raise Exception('please install matplotlib')
+        if fig_file is None:
+            fig_file = cls.generic_name()
+        if isinstance(image, str):
+            image = cv2.imread(image)[...,::-1]
+        else:
+            image = np.asarray(image)
+        if margin is None:
+            margin = [0.0, 0.0, 0.0, 0.0]
+        elif isinstance(margin, float):
+            margin = [margin, margin, margin, margin]
+        assert len(margin) == 4
+        if 'figsize' not in kwargs:
+            # compute figure size: use image ratio and take the drawable area
+            # into account that is left after subtracting margins.
+            image_ratio = image.shape[0] / image.shape[1]
+            image_area_ratio = (1.0 - margin[1] - margin[3]) / (1.0 - margin[0] - margin[2])
+            if cls.image_width is not None:
+                kwargs['figsize'] = (
+                    cls.image_width,
+                    cls.image_width * image_ratio / image_area_ratio
+                )
+            elif cls.image_height:
+                kwargs['figsize'] = (
+                    cls.image_height * image_area_ratio / image_ratio,
+                    cls.image_height
+                )
+        # dpi = max(cls.image_min_dpi, image.shape[1] / kwargs['figsize'][0] * cls.image_dpi_factor)
+        dpi = 200
+        # import pdb; pdb.set_trace()
+        fig = plt.figure(dpi=dpi, **kwargs)
+        ax = plt.Axes(fig, [0.0 + margin[0],
+                            0.0 + margin[1],
+                            1.0 - margin[2],
+                            1.0 - margin[3]])
+        ax.set_axis_off()
+        ax.set_xlim(-0.5, image.shape[1] - 0.5)  # imshow uses center-pixel-coordinates
+        ax.set_ylim(image.shape[0] - 0.5, -0.5)
+        fig.add_axes(ax)
+        ax.imshow(image)
+        if cls.white_overlay:
+            white_screen(ax, cls.white_overlay)
+        yield ax
+        if fig_file:
+            LOG.debug('writing image to %s', fig_file)
+            fig.savefig(fig_file)
+        if cls.show:
+            plt.show()
+            import pdb;pdb.set_trace()
+        plt.close(fig)
+def white_screen(ax, alpha=0.9):
+    ax.add_patch(
+        plt.Rectangle((0, 0), 1, 1, transform=ax.transAxes, alpha=alpha,
+                      facecolor='white')
+    )
+canvas = Canvas.blank
+image_canvas = Canvas.image

scalelsd/base/show/cli.py ADDED Viewed

	@@ -0,0 +1,24 @@

+# from hawp.config import defaults
+import logging
+from .canvas import Canvas
+from .painters import HAWPainter
+import matplotlib
+LOG = logging.getLogger(__name__)
+def cli(parser):
+    group = parser.add_argument_group('show')
+    assert not Canvas.show
+    group.add_argument('--show', default=False,action='store_true',
+                help='show every plot, i.e., call matplotlib show()')
+    group.add_argument('--edge-threshold', default=None, type=float,
+                help='show the wireframe edges whose confidences are greater than [edge_threshold]')
+    group.add_argument('--out-ext', default='png', type=str,
+                help='save the plot in specific format')
+def configure(args):
+    Canvas.show = args.show
+    Canvas.out_file_extension = args.out_ext
+    if args.edge_threshold is not None:
+        HAWPainter.confidence_threshold = args.edge_threshold

scalelsd/base/show/painters.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import logging
+import numpy as np
+import torch
+try:
+    import matplotlib
+    import matplotlib.animation
+    import matplotlib.collections
+    import matplotlib.patches
+except ImportError:
+    matplotlib = None
+LOG = logging.getLogger(__name__)
+class HAWPainter:
+    # line_width = None
+    # marker_size = None
+    line_width = 2
+    marker_size = 4
+    confidence_threshold = 0.05
+    def __init__(self):
+        if self.line_width is None:
+            self.line_width = 1
+        if self.marker_size is None:
+            self.marker_size = max(1, int(self.line_width * 0.5))
+    def draw_junctions(self, ax, wireframe, *,
+            edge_color = None, vertex_color = None):
+        if wireframe is None:
+            return
+        if edge_color is None:
+            edge_color = 'b'
+        if vertex_color is None:
+            vertex_color = 'c'
+        if 'lines_score' in wireframe.keys():
+            line_segments = wireframe['lines_pred'][wireframe['lines_score']>self.confidence_threshold]
+        else:
+            line_segments = wireframe['lines_pred']
+        if isinstance(line_segments, torch.Tensor):
+            line_segments = line_segments.cpu().numpy()
+        ax.plot(line_segments[:,0],line_segments[:,1],'.',color=vertex_color)
+        ax.plot(line_segments[:,2],line_segments[:,3],'.',
+        color=vertex_color)
+    def draw_wireframe(self, ax, wireframe, *,
+            edge_color = None, vertex_color = None):
+        if wireframe is None:
+            return
+        if edge_color is None:
+            edge_color = 'b'
+        if vertex_color is None:
+            vertex_color = 'c'
+        if 'lines_score' in wireframe.keys():
+            line_segments = wireframe['lines_pred'][wireframe['lines_score']>self.confidence_threshold]
+        else:
+            line_segments = wireframe['lines_pred']
+        # import pdb;pdb.set_trace()
+        if isinstance(line_segments, torch.Tensor):
+            line_segments = line_segments.cpu().numpy()
+        # import pdb;pdb.set_trace()
+        # line_segments = wireframe.line_segments(threshold=self.confidence_threshold)
+        # line_segments = line_segments.cpu().numpy()
+        ax.plot([line_segments[:,0],line_segments[:,2]],[line_segments[:,1],line_segments[:,3]],'-',color=edge_color,linewidth=self.line_width)
+        ax.plot(line_segments[:,0],line_segments[:,1],'.',color=vertex_color,markersize=self.marker_size)
+        ax.plot(line_segments[:,2],line_segments[:,3],'.',color=vertex_color,markersize=self.marker_size)

scalelsd/base/utils/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

scalelsd/base/utils/logger.py ADDED Viewed

	@@ -0,0 +1,30 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import logging
+import os
+import sys
+from pythonjsonlogger import jsonlogger
+def setup_logger(name, save_dir, out_file='log.txt', json_format=False, rank=0):
+    logger = logging.getLogger(name)
+    logger.setLevel(logging.DEBUG)
+    if json_format:
+        formatter = jsonlogger.JsonFormatter("%(asctime)s %(name)s %(levelname)s: %(message)s")
+    else:
+        formatter = logging.Formatter("%(asctime)s %(name)s %(levelname)s: %(message)s")
+    if rank == 0:
+        ch = logging.StreamHandler(stream=sys.stdout)
+        ch.setLevel(logging.DEBUG)
+        ch.setFormatter(formatter)
+        logger.addHandler(ch)
+    if save_dir:
+        os.makedirs(save_dir, exist_ok=True)
+        fh = logging.FileHandler(os.path.join(save_dir, out_file))
+        fh.setLevel(logging.DEBUG)
+        fh.setFormatter(formatter)
+        logger.addHandler(fh)
+    return logger

scalelsd/base/utils/metric_logger.py ADDED Viewed

	@@ -0,0 +1,77 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+from collections import defaultdict
+from collections import deque
+import torch
+class SmoothedValue(object):
+    """Track a series of values and provide access to smoothed values over a
+    window or the global series average.
+    """
+    def __init__(self, window_size=20):
+        self.deque = deque(maxlen=window_size)
+        self.series = []
+        self.total = 0.0
+        self.count = 0
+    def update(self, value):
+        self.deque.append(value)
+        self.series.append(value)
+        self.count += 1
+        self.total += value
+    @property
+    def median(self):
+        d = torch.tensor(list(self.deque))
+        return d.median().item()
+    @property
+    def avg(self):
+        d = torch.tensor(list(self.deque))
+        return d.mean().item()
+    @property
+    def global_avg(self):
+        return self.total / self.count
+class MetricLogger(object):
+    def __init__(self, delimiter="\t"):
+        self.meters = defaultdict(SmoothedValue)
+        self.delimiter = delimiter
+    def update(self, **kwargs):
+        for k, v in kwargs.items():
+            if isinstance(v, torch.Tensor):
+                v = v.item()
+            assert isinstance(v, (float, int))
+            self.meters[k].update(v)
+    def __getattr__(self, attr):
+        if attr in self.meters:
+            return self.meters[attr]
+        if attr in self.__dict__:
+            return self.__dict__[attr]
+        raise AttributeError("'{}' object has no attribute '{}'".format(
+                    type(self).__name__, attr))
+    def __str__(self):
+        loss_str = []
+        keys = sorted(self.meters)
+        # for name, meter in self.meters.items():
+        for name in keys:
+            meter = self.meters[name]
+            loss_str.append(
+                "{}: {:.4f} ({:.4f})".format(name, meter.median, meter.global_avg)
+            )
+        return self.delimiter.join(loss_str)
+    def tensorborad(self, iteration, writter, phase='train'):
+        for name, meter in self.meters.items():
+            if 'loss' in name:
+                # writter.add_scalar('average/{}'.format(name), meter.avg, iteration)
+                writter.add_scalar('{}/global/{}'.format(phase,name), meter.global_avg, iteration)
+                # writter.add_scalar('median/{}'.format(name), meter.median, iteration)

scalelsd/base/wireframe.py ADDED Viewed

	@@ -0,0 +1,110 @@

+import copy
+import math
+import numpy as np
+import torch
+import json
+class WireframeGraph:
+    def __init__(self,
+                vertices: torch.Tensor,
+                v_confidences: torch.Tensor,
+                edges: torch.Tensor,
+                edge_weights: torch.Tensor,
+                frame_width: int,
+                frame_height: int):
+        self.vertices = vertices
+        self.v_confidences = v_confidences
+        self.edges = edges
+        self.weights = edge_weights
+        self.frame_width = frame_width
+        self.frame_height = frame_height
+    @classmethod
+    def xyxy2indices(cls,junctions, lines):
+        # junctions: (N,2)
+        # lines: (M,4)
+        # return: (M,2)
+        dist1 = torch.norm(junctions[None,:,:]-lines[:,None,:2],dim=-1)
+        dist2 = torch.norm(junctions[None,:,:]-lines[:,None,2:],dim=-1)
+        idx1 = torch.argmin(dist1,dim=-1)
+        idx2 = torch.argmin(dist2,dim=-1)
+        return torch.stack((idx1,idx2),dim=-1)
+    @classmethod
+    def load_json(cls, fname):
+        with open(fname,'r') as f:
+            data = json.load(f)
+        vertices = torch.tensor(data['vertices'])
+        v_confidences = torch.tensor(data['vertices-score'])
+        edges = torch.tensor(data['edges'])
+        edge_weights = torch.tensor(data['edges-weights'])
+        height = data['height']
+        width = data['width']
+        return WireframeGraph(vertices,v_confidences,edges,edge_weights,width,height)
+    @property
+    def is_empty(self):
+        for key, val in self.__dict__.items():
+            if val is None:
+                return True
+        return False
+    @property
+    def num_vertices(self):
+        if self.is_empty:
+            return 0
+        return self.vertices.shape[0]
+    @property
+    def num_edges(self):
+        if self.is_empty:
+            return 0
+        return self.edges.shape[0]
+    def line_segments(self, threshold = 0.05, device=None, to_np=False):
+        is_valid = self.weights>threshold
+        p1 = self.vertices[self.edges[is_valid,0]]
+        p2 = self.vertices[self.edges[is_valid,1]]
+        ps = self.weights[is_valid]
+        lines = torch.cat((p1,p2,ps[:,None]),dim=-1)
+        if device is not None:
+            lines = lines.to(device)
+        if to_np:
+            lines = lines.cpu().numpy()
+        return lines
+       # if device != self.device:
+    def rescale(self, image_width, image_height):
+        scale_x = float(image_width)/float(self.frame_width)
+        scale_y = float(image_height)/float(self.frame_height)
+        self.vertices[:,0] *= scale_x
+        self.vertices[:,1] *= scale_y
+        self.frame_width = image_width
+        self.frame_height = image_height
+    def jsonize(self):
+        return {
+            'vertices': self.vertices.cpu().tolist(),
+            'vertices-score': self.v_confidences.cpu().tolist(),
+            'edges': self.edges.cpu().tolist(),
+            'edges-weights': self.weights.cpu().tolist(),
+            'height': self.frame_height,
+            'width': self.frame_width,
+        }
+    def __repr__(self) -> str:
+        return "WireframeGraph\n"+\
+               "Vertices: {}\n".format(self.num_vertices)+\
+               "Edges: {}\n".format(self.num_edges,) + \
+               "Frame size (HxW): {}x{}".format(self.frame_height,self.frame_width)
+#graph = WireframeGraph()
+if __name__ == "__main__":
+    graph = WireframeGraph.load_json('NeuS/public_data/bmvs_clock/hawp/000.json')
+    print(graph)

scalelsd/encoder/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .hafm import HAFMencoder

scalelsd/encoder/hafm.py ADDED Viewed

	@@ -0,0 +1,152 @@

+import torch
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+from halt import _C
+class HAFMencoder(object):
+    def __init__(self, cfg):
+        self.dis_th = cfg.ENCODER.DIS_TH
+        self.ang_th = cfg.ENCODER.ANG_TH
+        self.num_static_pos_lines = cfg.ENCODER.NUM_STATIC_POS_LINES
+        self.num_static_neg_lines = cfg.ENCODER.NUM_STATIC_NEG_LINES
+    def __call__(self,annotations):
+        targets = []
+        metas   = []
+        for ann in annotations:
+            t,m = self._process_per_image(ann)
+            targets.append(t)
+            metas.append(m)
+        return default_collate(targets),metas
+    def adjacent_matrix(self, n, edges, device):
+        mat = torch.zeros(n+1,n+1,dtype=torch.bool,device=device)
+        if edges.size(0)>0:
+            mat[edges[:,0], edges[:,1]] = 1
+            mat[edges[:,1], edges[:,0]] = 1
+        return mat
+    def _process_per_image(self,ann):
+        junctions = ann['junctions']
+        device = junctions.device
+        height, width = ann['height'], ann['width']
+        jmap = torch.zeros((height,width),device=device)
+        joff = torch.zeros((2,height,width),device=device,dtype=torch.float32)
+        # junctions[:,0] = junctions[:,0].clamp(min=0,max=width-1)
+        # junctions[:,1] = junctions[:,1].clamp(min=0,max=height-1)
+        xint,yint = junctions[:,0].long(), junctions[:,1].long()
+        off_x = junctions[:,0] - xint.float()-0.5
+        off_y = junctions[:,1] - yint.float()-0.5
+        jmap[yint,xint] = 1
+        joff[0,yint,xint] = off_x
+        joff[1,yint,xint] = off_y
+        edges_positive = ann['edges_positive']
+        edges_negative = ann['edges_negative']
+        pos_mat = self.adjacent_matrix(junctions.size(0),edges_positive,device)
+        neg_mat = self.adjacent_matrix(junctions.size(0),edges_negative,device)
+        lines = torch.cat((junctions[edges_positive[:,0]], junctions[edges_positive[:,1]]),dim=-1)
+        lines_neg = torch.cat((junctions[edges_negative[:2000,0]],junctions[edges_negative[:2000,1]]),dim=-1)
+        lmap, _, _ = _C.encodels(lines,height,width,height,width,lines.size(0))
+        center_points = (lines[:,:2] + lines[:,2:])/2.0
+        cmap = torch.zeros((height,width),device=device)
+        cxint, cyint = center_points[:,0].long(), center_points[:,1].long()
+        cmap[cyint,cxint] = 1
+        # yy,xx = torch.meshgrid(torch.arange(width,device=device),torch.arange(width,device=device))
+        # gaussian = torch.exp(-((yy[:,:,None]-center_points[None,None,:,1])**2 + (xx[:,:,None]-center_points[None,None,:,0])**2)/(2*(2*2)))
+        # cmap = gaussian.max(dim=-1)[0]
+        lpos = np.random.permutation(lines.cpu().numpy())[:self.num_static_pos_lines]
+        lneg = np.random.permutation(lines_neg.cpu().numpy())[:self.num_static_neg_lines]
+        # lpos = lines[torch.randperm(lines.size(0),device=device)][:self.num_static_pos_lines]
+        # lneg = lines_neg[torch.randperm(lines_neg.size(0),device=device)][:self.num_static_neg_lines]
+        lpos = torch.from_numpy(lpos).to(device)
+        lneg = torch.from_numpy(lneg).to(device)
+        lpre = torch.cat((lpos,lneg),dim=0)
+        _swap = (torch.rand(lpre.size(0))>0.5).to(device)
+        lpre[_swap] = lpre[_swap][:,[2,3,0,1]]
+        lpre_label = torch.cat(
+            [
+                torch.ones(lpos.size(0),device=device),
+                torch.zeros(lneg.size(0),device=device)
+             ])
+        meta = {
+            'junc': junctions,
+            'Lpos':   pos_mat,
+            'Lneg':   neg_mat,
+            'lpre':      lpre,
+            'lpre_label': lpre_label,
+            'lines':     lines,
+        }
+        dismap = torch.sqrt(lmap[0]**2+lmap[1]**2)[None]
+        def _normalize(inp):
+            mag = torch.sqrt(inp[0]*inp[0]+inp[1]*inp[1])
+            return inp/(mag+1e-6)
+        md_map = _normalize(lmap[:2])
+        st_map = _normalize(lmap[2:4])
+        ed_map = _normalize(lmap[4:])
+        st_map = lmap[2:4]
+        ed_map = lmap[4:]
+        md_ = md_map.reshape(2,-1).t()
+        st_ = st_map.reshape(2,-1).t()
+        ed_ = ed_map.reshape(2,-1).t()
+        Rt = torch.cat(
+                (torch.cat((md_[:,None,None,0],md_[:,None,None,1]),dim=2),
+                 torch.cat((-md_[:,None,None,1], md_[:,None,None,0]),dim=2)),dim=1)
+        R = torch.cat(
+                (torch.cat((md_[:,None,None,0], -md_[:,None,None,1]),dim=2),
+                 torch.cat((md_[:,None,None,1], md_[:,None,None,0]),dim=2)),dim=1)
+        Rtst_ = torch.matmul(Rt, st_[:,:,None]).squeeze(-1).t()
+        Rted_ = torch.matmul(Rt, ed_[:,:,None]).squeeze(-1).t()
+        swap_mask = (Rtst_[1]<0)*(Rted_[1]>0)
+        pos_ = Rtst_.clone()
+        neg_ = Rted_.clone()
+        temp = pos_[:,swap_mask]
+        pos_[:,swap_mask] = neg_[:,swap_mask]
+        neg_[:,swap_mask] = temp
+        pos_[0] = pos_[0].clamp(min=1e-9)
+        pos_[1] = pos_[1].clamp(min=1e-9)
+        neg_[0] = neg_[0].clamp(min=1e-9)
+        neg_[1] = neg_[1].clamp(max=-1e-9)
+        mask = (dismap.view(-1)<=self.dis_th).float()
+        pos_map = pos_.reshape(-1,height,width)
+        neg_map = neg_.reshape(-1,height,width)
+        md_angle  = torch.atan2(md_map[1], md_map[0])
+        pos_angle = torch.atan2(pos_map[1],pos_map[0])
+        neg_angle = torch.atan2(neg_map[1],neg_map[0])
+        mask *= (pos_angle.reshape(-1)>self.ang_th*np.pi/2.0)
+        mask *= (neg_angle.reshape(-1)<-self.ang_th*np.pi/2.0)
+        pos_angle_n = pos_angle/(np.pi/2)
+        neg_angle_n = -neg_angle/(np.pi/2)
+        md_angle_n  = md_angle/(np.pi*2) + 0.5
+        mask    = mask.reshape(height,width)
+        hafm_ang = torch.cat((md_angle_n[None],pos_angle_n[None],neg_angle_n[None],),dim=0)
+        hafm_dis   = dismap.clamp(max=self.dis_th)/self.dis_th
+        mask = mask[None]
+        target = {'jloc':jmap[None],
+                'joff':joff,
+                'cloc': cmap[None],
+                'md': hafm_ang,
+                'dis': hafm_dis,
+                'mask': mask
+               }
+        return target, meta

scalelsd/ssl/backbones/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .build import build_backbone

scalelsd/ssl/backbones/build.py ADDED Viewed

	@@ -0,0 +1,28 @@

+from .dpt.models import DPTFieldModel
+def build_dpt(
+    basemodel = "vitb_rn50_384",
+    features=256,
+    readout = "project",
+    channels_last = False,
+    use_bn = True,
+    enable_attention_hooks = False,
+    head_size = [[3],[1],[1],[2],[2]],
+    use_layer_scale = False,
+    **kwargs):
+    model = DPTFieldModel(
+        features=features,
+        backbone=basemodel,
+        readout=readout,
+        channels_last=channels_last,
+        use_bn=use_bn,
+        enable_attention_hooks=enable_attention_hooks,
+        head_size=head_size,
+        use_layer_scale=use_layer_scale
+    )
+    return model
+def build_backbone(**kwargs):
+    return build_dpt(**kwargs)

scalelsd/ssl/backbones/dpt/__init__.py ADDED Viewed

File without changes

scalelsd/ssl/backbones/dpt/base_model.py ADDED Viewed

	@@ -0,0 +1,16 @@

+import torch
+class BaseModel(torch.nn.Module):
+    def load(self, path):
+        """Load model from file.
+        Args:
+            path (str): file path
+        """
+        parameters = torch.load(path, map_location=torch.device("cpu"))
+        if "optimizer" in parameters:
+            parameters = parameters["model"]
+        self.load_state_dict(parameters)

scalelsd/ssl/backbones/dpt/blocks.py ADDED Viewed

	@@ -0,0 +1,388 @@

+import torch
+import torch.nn as nn
+from .vit import (
+    _make_pretrained_vitb_rn50_384,
+    _make_pretrained_vitl16_384,
+    _make_pretrained_vitb16_384,
+    forward_vit,
+)
+def _make_encoder(
+    backbone,
+    features,
+    use_pretrained,
+    groups=1,
+    expand=False,
+    exportable=True,
+    hooks=None,
+    use_vit_only=False,
+    use_readout="ignore",
+    enable_attention_hooks=False,
+    use_layer_scale=False,
+):
+    if backbone == "vitl16_384":
+        pretrained = _make_pretrained_vitl16_384(
+            use_pretrained,
+            hooks=hooks,
+            use_readout=use_readout,
+            enable_attention_hooks=enable_attention_hooks,
+        )
+        scratch = _make_scratch(
+            [256, 512, 1024, 1024], features, groups=groups, expand=expand
+        )  # ViT-L/16 - 85.0% Top1 (backbone)
+    elif backbone == "vitb_rn50_384":
+        pretrained = _make_pretrained_vitb_rn50_384(
+            use_pretrained,
+            hooks=hooks,
+            use_vit_only=use_vit_only,
+            use_readout=use_readout,
+            enable_attention_hooks=enable_attention_hooks,
+            use_layer_scale=use_layer_scale,
+        )
+        scratch = _make_scratch(
+            [256, 512, 768, 768], features, groups=groups, expand=expand
+        )  # ViT-H/16 - 85.0% Top1 (backbone)
+    elif backbone == "vitb16_384":
+        pretrained = _make_pretrained_vitb16_384(
+            use_pretrained,
+            hooks=hooks,
+            use_readout=use_readout,
+            enable_attention_hooks=enable_attention_hooks,
+        )
+        scratch = _make_scratch(
+            [96, 192, 384, 768], features, groups=groups, expand=expand
+        )  # ViT-B/16 - 84.6% Top1 (backbone)
+    elif backbone == "resnext101_wsl":
+        pretrained = _make_pretrained_resnext101_wsl(use_pretrained)
+        scratch = _make_scratch(
+            [256, 512, 1024, 2048], features, groups=groups, expand=expand
+        )  # efficientnet_lite3
+    else:
+        print(f"Backbone '{backbone}' not implemented")
+        assert False
+    return pretrained, scratch
+def _make_scratch(in_shape, out_shape, groups=1, expand=False):
+    scratch = nn.Module()
+    out_shape1 = out_shape
+    out_shape2 = out_shape
+    out_shape3 = out_shape
+    out_shape4 = out_shape
+    if expand == True:
+        out_shape1 = out_shape
+        out_shape2 = out_shape * 2
+        out_shape3 = out_shape * 4
+        out_shape4 = out_shape * 8
+    scratch.layer1_rn = nn.Conv2d(
+        in_shape[0],
+        out_shape1,
+        kernel_size=3,
+        stride=1,
+        padding=1,
+        bias=False,
+        groups=groups,
+    )
+    scratch.layer2_rn = nn.Conv2d(
+        in_shape[1],
+        out_shape2,
+        kernel_size=3,
+        stride=1,
+        padding=1,
+        bias=False,
+        groups=groups,
+    )
+    scratch.layer3_rn = nn.Conv2d(
+        in_shape[2],
+        out_shape3,
+        kernel_size=3,
+        stride=1,
+        padding=1,
+        bias=False,
+        groups=groups,
+    )
+    scratch.layer4_rn = nn.Conv2d(
+        in_shape[3],
+        out_shape4,
+        kernel_size=3,
+        stride=1,
+        padding=1,
+        bias=False,
+        groups=groups,
+    )
+    return scratch
+def _make_resnet_backbone(resnet):
+    pretrained = nn.Module()
+    pretrained.layer1 = nn.Sequential(
+        resnet.conv1, resnet.bn1, resnet.relu, resnet.maxpool, resnet.layer1
+    )
+    pretrained.layer2 = resnet.layer2
+    pretrained.layer3 = resnet.layer3
+    pretrained.layer4 = resnet.layer4
+    return pretrained
+def _make_pretrained_resnext101_wsl(use_pretrained):
+    resnet = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl")
+    return _make_resnet_backbone(resnet)
+class Interpolate(nn.Module):
+    """Interpolation module."""
+    def __init__(self, scale_factor, mode, align_corners=False):
+        """Init.
+        Args:
+            scale_factor (float): scaling
+            mode (str): interpolation mode
+        """
+        super(Interpolate, self).__init__()
+        self.interp = nn.functional.interpolate
+        self.scale_factor = scale_factor
+        self.mode = mode
+        self.align_corners = align_corners
+    def forward(self, x):
+        """Forward pass.
+        Args:
+            x (tensor): input
+        Returns:
+            tensor: interpolated data
+        """
+        x = self.interp(
+            x,
+            scale_factor=self.scale_factor,
+            mode=self.mode,
+            align_corners=self.align_corners,
+        )
+        # x = self.interp(x, scale_factor=self.scale_factor)
+        # x = self.interp(x, scale_factor=self.scale_factor, mode='bilinear', align_corners=True)
+        return x
+class ResidualConvUnit(nn.Module):
+    """Residual convolution module."""
+    def __init__(self, features):
+        """Init.
+        Args:
+            features (int): number of features
+        """
+        super().__init__()
+        self.conv1 = nn.Conv2d(
+            features, features, kernel_size=3, stride=1, padding=1, bias=True
+        )
+        self.conv2 = nn.Conv2d(
+            features, features, kernel_size=3, stride=1, padding=1, bias=True
+        )
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        """Forward pass.
+        Args:
+            x (tensor): input
+        Returns:
+            tensor: output
+        """
+        out = self.relu(x)
+        out = self.conv1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        return out + x
+class FeatureFusionBlock(nn.Module):
+    """Feature fusion block."""
+    def __init__(self, features):
+        """Init.
+        Args:
+            features (int): number of features
+        """
+        super(FeatureFusionBlock, self).__init__()
+        self.resConfUnit1 = ResidualConvUnit(features)
+        self.resConfUnit2 = ResidualConvUnit(features)
+    def forward(self, *xs):
+        """Forward pass.
+        Returns:
+            tensor: output
+        """
+        output = xs[0]
+        if len(xs) == 2:
+            output += self.resConfUnit1(xs[1])
+        output = self.resConfUnit2(output)
+        output = nn.functional.interpolate(
+            output, scale_factor=2, mode="bilinear", align_corners=True
+        )
+        return output
+class ResidualConvUnit_custom(nn.Module):
+    """Residual convolution module."""
+    def __init__(self, features, activation, bn):
+        """Init.
+        Args:
+            features (int): number of features
+        """
+        super().__init__()
+        self.bn = bn
+        self.groups = 1
+        self.conv1 = nn.Conv2d(
+            features,
+            features,
+            kernel_size=3,
+            stride=1,
+            padding=1,
+            bias=not self.bn,
+            groups=self.groups,
+        )
+        self.conv2 = nn.Conv2d(
+            features,
+            features,
+            kernel_size=3,
+            stride=1,
+            padding=1,
+            bias=not self.bn,
+            groups=self.groups,
+        )
+        if self.bn == True:
+            self.bn1 = nn.BatchNorm2d(features)
+            self.bn2 = nn.BatchNorm2d(features)
+        self.activation = activation
+        self.skip_add = nn.quantized.FloatFunctional()
+    def forward(self, x):
+        """Forward pass.
+        Args:
+            x (tensor): input
+        Returns:
+            tensor: output
+        """
+        out = self.activation(x)
+        out = self.conv1(out)
+        if self.bn == True:
+            out = self.bn1(out)
+        out = self.activation(out)
+        out = self.conv2(out)
+        if self.bn == True:
+            out = self.bn2(out)
+        if self.groups > 1:
+            out = self.conv_merge(out)
+        return self.skip_add.add(out, x)
+        # return out + x
+class FeatureFusionBlock_custom(nn.Module):
+    """Feature fusion block."""
+    def __init__(
+        self,
+        features,
+        activation,
+        deconv=False,
+        bn=False,
+        expand=False,
+        align_corners=True,
+    ):
+        """Init.
+        Args:
+            features (int): number of features
+        """
+        super(FeatureFusionBlock_custom, self).__init__()
+        self.deconv = deconv
+        self.align_corners = align_corners
+        self.groups = 1
+        self.expand = expand
+        out_features = features
+        if self.expand == True:
+            out_features = features // 2
+        self.out_conv = nn.Conv2d(
+            features,
+            out_features,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias=True,
+            groups=1,
+        )
+        self.resConfUnit1 = ResidualConvUnit_custom(features, activation, bn)
+        self.resConfUnit2 = ResidualConvUnit_custom(features, activation, bn)
+        self.skip_add = nn.quantized.FloatFunctional()
+    def forward(self, *xs):
+        """Forward pass.
+        Returns:
+            tensor: output
+        """
+        output = xs[0]
+        if len(xs) == 2:
+            res = self.resConfUnit1(xs[1])
+            output = self.skip_add.add(output, res)
+            # output += res
+        output = self.resConfUnit2(output)
+        output = nn.functional.interpolate(
+            output, scale_factor=2, mode="bilinear", align_corners=self.align_corners
+        )
+        output = self.out_conv(output)
+        return output

scalelsd/ssl/backbones/dpt/midas_net.py ADDED Viewed

	@@ -0,0 +1,77 @@

+"""MidashNet: Network for monocular depth estimation trained by mixing several datasets.
+This file contains code that is adapted from
+https://github.com/thomasjpfan/pytorch_refinenet/blob/master/pytorch_refinenet/refinenet/refinenet_4cascade.py
+"""
+import torch
+import torch.nn as nn
+from .base_model import BaseModel
+from .blocks import FeatureFusionBlock, Interpolate, _make_encoder
+class MidasNet_large(BaseModel):
+    """Network for monocular depth estimation."""
+    def __init__(self, path=None, features=256, non_negative=True):
+        """Init.
+        Args:
+            path (str, optional): Path to saved model. Defaults to None.
+            features (int, optional): Number of features. Defaults to 256.
+            backbone (str, optional): Backbone network for encoder. Defaults to resnet50
+        """
+        print("Loading weights: ", path)
+        super(MidasNet_large, self).__init__()
+        use_pretrained = False if path is None else True
+        self.pretrained, self.scratch = _make_encoder(
+            backbone="resnext101_wsl", features=features, use_pretrained=use_pretrained
+        )
+        self.scratch.refinenet4 = FeatureFusionBlock(features)
+        self.scratch.refinenet3 = FeatureFusionBlock(features)
+        self.scratch.refinenet2 = FeatureFusionBlock(features)
+        self.scratch.refinenet1 = FeatureFusionBlock(features)
+        self.scratch.output_conv = nn.Sequential(
+            nn.Conv2d(features, 128, kernel_size=3, stride=1, padding=1),
+            Interpolate(scale_factor=2, mode="bilinear"),
+            nn.Conv2d(128, 32, kernel_size=3, stride=1, padding=1),
+            nn.ReLU(True),
+            nn.Conv2d(32, 1, kernel_size=1, stride=1, padding=0),
+            nn.ReLU(True) if non_negative else nn.Identity(),
+        )
+        if path:
+            self.load(path)
+    def forward(self, x):
+        """Forward pass.
+        Args:
+            x (tensor): input data (image)
+        Returns:
+            tensor: depth
+        """
+        layer_1 = self.pretrained.layer1(x)
+        layer_2 = self.pretrained.layer2(layer_1)
+        layer_3 = self.pretrained.layer3(layer_2)
+        layer_4 = self.pretrained.layer4(layer_3)
+        layer_1_rn = self.scratch.layer1_rn(layer_1)
+        layer_2_rn = self.scratch.layer2_rn(layer_2)
+        layer_3_rn = self.scratch.layer3_rn(layer_3)
+        layer_4_rn = self.scratch.layer4_rn(layer_4)
+        path_4 = self.scratch.refinenet4(layer_4_rn)
+        path_3 = self.scratch.refinenet3(path_4, layer_3_rn)
+        path_2 = self.scratch.refinenet2(path_3, layer_2_rn)
+        path_1 = self.scratch.refinenet1(path_2, layer_1_rn)
+        out = self.scratch.output_conv(path_1)
+        return torch.squeeze(out, dim=1)

scalelsd/ssl/backbones/dpt/models.py ADDED Viewed

	@@ -0,0 +1,115 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .base_model import BaseModel
+from .blocks import (
+    FeatureFusionBlock,
+    FeatureFusionBlock_custom,
+    Interpolate,
+    _make_encoder,
+    forward_vit,
+)
+from ..multi_task_head import MultitaskHead
+def _make_fusion_block(features, use_bn):
+    return FeatureFusionBlock_custom(
+        features,
+        nn.ReLU(False),
+        deconv=False,
+        bn=use_bn,
+        expand=False,
+        align_corners=True,
+    )
+class DPT(BaseModel):
+    def __init__(
+        self,
+        head,
+        features=256,
+        backbone="vitb_rn50_384",
+        readout="project",
+        channels_last=False,
+        use_bn=False,
+        enable_attention_hooks=False,
+        use_layer_scale=False,
+    ):
+        super(DPT, self).__init__()
+        self.channels_last = channels_last
+        hooks = {
+            "vitb_rn50_384": [0, 1, 8, 11],
+            "vitb16_384": [2, 5, 8, 11],
+            "vitl16_384": [5, 11, 17, 23],
+        }
+        # Instantiate backbone and reassemble blocks
+        self.pretrained, self.scratch = _make_encoder(
+            backbone,
+            features,
+            False,  # Set to true of you want to train from scratch, uses ImageNet weights
+            groups=1,
+            expand=False,
+            exportable=False,
+            hooks=hooks[backbone],
+            use_readout=readout,
+            enable_attention_hooks=enable_attention_hooks,
+            use_layer_scale=use_layer_scale,
+        )
+        self.scratch.refinenet1 = _make_fusion_block(features, use_bn)
+        self.scratch.refinenet2 = _make_fusion_block(features, use_bn)
+        self.scratch.refinenet3 = _make_fusion_block(features, use_bn)
+        self.scratch.refinenet4 = _make_fusion_block(features, use_bn)
+        self.scratch.output_conv = head
+    def forward(self, x):
+        if self.channels_last == True:
+            x.contiguous(memory_format=torch.channels_last)
+        layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)
+        layer_1_rn = self.scratch.layer1_rn(layer_1)
+        layer_2_rn = self.scratch.layer2_rn(layer_2)
+        layer_3_rn = self.scratch.layer3_rn(layer_3)
+        layer_4_rn = self.scratch.layer4_rn(layer_4)
+        path_4 = self.scratch.refinenet4(layer_4_rn)
+        path_3 = self.scratch.refinenet3(path_4, layer_3_rn)
+        path_2 = self.scratch.refinenet2(path_3, layer_2_rn)
+        path_1 = self.scratch.refinenet1(path_2, layer_1_rn)
+        out = self.scratch.output_conv(path_1)
+        return out
+class DPTFieldModel(DPT):
+    def __init__(self, path=None, non_negative=True, head_size=[[3],[1],[1],[2],[2]], **kwargs):
+        features = kwargs["features"] if "features" in kwargs else 256
+        kwargs["use_bn"] = True
+        num_class = sum(sum(head_size,[]))
+        head = nn.Sequential(
+            nn.Conv2d(features, features//2, kernel_size=3, stride=1, padding=1),
+            # nn.BatchNorm2d(features//2),
+            nn.ReLU(True),
+            MultitaskHead(features//2, num_class, head_size=head_size),
+        )
+        super().__init__(head, **kwargs)
+        self.stride = 2
+    def forward(self, x):
+        if x.shape[1] == 1:
+            x = torch.cat([x,x,x], dim=1)
+        out = super().forward(x)
+        return out, None

scalelsd/ssl/backbones/dpt/transforms.py ADDED Viewed

	@@ -0,0 +1,231 @@

+import numpy as np
+import cv2
+import math
+def apply_min_size(sample, size, image_interpolation_method=cv2.INTER_AREA):
+    """Rezise the sample to ensure the given size. Keeps aspect ratio.
+    Args:
+        sample (dict): sample
+        size (tuple): image size
+    Returns:
+        tuple: new size
+    """
+    shape = list(sample["disparity"].shape)
+    if shape[0] >= size[0] and shape[1] >= size[1]:
+        return sample
+    scale = [0, 0]
+    scale[0] = size[0] / shape[0]
+    scale[1] = size[1] / shape[1]
+    scale = max(scale)
+    shape[0] = math.ceil(scale * shape[0])
+    shape[1] = math.ceil(scale * shape[1])
+    # resize
+    sample["image"] = cv2.resize(
+        sample["image"], tuple(shape[::-1]), interpolation=image_interpolation_method
+    )
+    sample["disparity"] = cv2.resize(
+        sample["disparity"], tuple(shape[::-1]), interpolation=cv2.INTER_NEAREST
+    )
+    sample["mask"] = cv2.resize(
+        sample["mask"].astype(np.float32),
+        tuple(shape[::-1]),
+        interpolation=cv2.INTER_NEAREST,
+    )
+    sample["mask"] = sample["mask"].astype(bool)
+    return tuple(shape)
+class Resize(object):
+    """Resize sample to given size (width, height)."""
+    def __init__(
+        self,
+        width,
+        height,
+        resize_target=True,
+        keep_aspect_ratio=False,
+        ensure_multiple_of=1,
+        resize_method="lower_bound",
+        image_interpolation_method=cv2.INTER_AREA,
+    ):
+        """Init.
+        Args:
+            width (int): desired output width
+            height (int): desired output height
+            resize_target (bool, optional):
+                True: Resize the full sample (image, mask, target).
+                False: Resize image only.
+                Defaults to True.
+            keep_aspect_ratio (bool, optional):
+                True: Keep the aspect ratio of the input sample.
+                Output sample might not have the given width and height, and
+                resize behaviour depends on the parameter 'resize_method'.
+                Defaults to False.
+            ensure_multiple_of (int, optional):
+                Output width and height is constrained to be multiple of this parameter.
+                Defaults to 1.
+            resize_method (str, optional):
+                "lower_bound": Output will be at least as large as the given size.
+                "upper_bound": Output will be at max as large as the given size. (Output size might be smaller than given size.)
+                "minimal": Scale as least as possible.  (Output size might be smaller than given size.)
+                Defaults to "lower_bound".
+        """
+        self.__width = width
+        self.__height = height
+        self.__resize_target = resize_target
+        self.__keep_aspect_ratio = keep_aspect_ratio
+        self.__multiple_of = ensure_multiple_of
+        self.__resize_method = resize_method
+        self.__image_interpolation_method = image_interpolation_method
+    def constrain_to_multiple_of(self, x, min_val=0, max_val=None):
+        y = (np.round(x / self.__multiple_of) * self.__multiple_of).astype(int)
+        if max_val is not None and y > max_val:
+            y = (np.floor(x / self.__multiple_of) * self.__multiple_of).astype(int)
+        if y < min_val:
+            y = (np.ceil(x / self.__multiple_of) * self.__multiple_of).astype(int)
+        return y
+    def get_size(self, width, height):
+        # determine new height and width
+        scale_height = self.__height / height
+        scale_width = self.__width / width
+        if self.__keep_aspect_ratio:
+            if self.__resize_method == "lower_bound":
+                # scale such that output size is lower bound
+                if scale_width > scale_height:
+                    # fit width
+                    scale_height = scale_width
+                else:
+                    # fit height
+                    scale_width = scale_height
+            elif self.__resize_method == "upper_bound":
+                # scale such that output size is upper bound
+                if scale_width < scale_height:
+                    # fit width
+                    scale_height = scale_width
+                else:
+                    # fit height
+                    scale_width = scale_height
+            elif self.__resize_method == "minimal":
+                # scale as least as possbile
+                if abs(1 - scale_width) < abs(1 - scale_height):
+                    # fit width
+                    scale_height = scale_width
+                else:
+                    # fit height
+                    scale_width = scale_height
+            else:
+                raise ValueError(
+                    f"resize_method {self.__resize_method} not implemented"
+                )
+        if self.__resize_method == "lower_bound":
+            new_height = self.constrain_to_multiple_of(
+                scale_height * height, min_val=self.__height
+            )
+            new_width = self.constrain_to_multiple_of(
+                scale_width * width, min_val=self.__width
+            )
+        elif self.__resize_method == "upper_bound":
+            new_height = self.constrain_to_multiple_of(
+                scale_height * height, max_val=self.__height
+            )
+            new_width = self.constrain_to_multiple_of(
+                scale_width * width, max_val=self.__width
+            )
+        elif self.__resize_method == "minimal":
+            new_height = self.constrain_to_multiple_of(scale_height * height)
+            new_width = self.constrain_to_multiple_of(scale_width * width)
+        else:
+            raise ValueError(f"resize_method {self.__resize_method} not implemented")
+        return (new_width, new_height)
+    def __call__(self, sample):
+        width, height = self.get_size(
+            sample["image"].shape[1], sample["image"].shape[0]
+        )
+        # resize sample
+        sample["image"] = cv2.resize(
+            sample["image"],
+            (width, height),
+            interpolation=self.__image_interpolation_method,
+        )
+        if self.__resize_target:
+            if "disparity" in sample:
+                sample["disparity"] = cv2.resize(
+                    sample["disparity"],
+                    (width, height),
+                    interpolation=cv2.INTER_NEAREST,
+                )
+            if "depth" in sample:
+                sample["depth"] = cv2.resize(
+                    sample["depth"], (width, height), interpolation=cv2.INTER_NEAREST
+                )
+            sample["mask"] = cv2.resize(
+                sample["mask"].astype(np.float32),
+                (width, height),
+                interpolation=cv2.INTER_NEAREST,
+            )
+            sample["mask"] = sample["mask"].astype(bool)
+        return sample
+class NormalizeImage(object):
+    """Normlize image by given mean and std."""
+    def __init__(self, mean, std):
+        self.__mean = mean
+        self.__std = std
+    def __call__(self, sample):
+        sample["image"] = (sample["image"] - self.__mean) / self.__std
+        return sample
+class PrepareForNet(object):
+    """Prepare sample for usage as network input."""
+    def __init__(self):
+        pass
+    def __call__(self, sample):
+        image = np.transpose(sample["image"], (2, 0, 1))
+        sample["image"] = np.ascontiguousarray(image).astype(np.float32)
+        if "mask" in sample:
+            sample["mask"] = sample["mask"].astype(np.float32)
+            sample["mask"] = np.ascontiguousarray(sample["mask"])
+        if "disparity" in sample:
+            disparity = sample["disparity"].astype(np.float32)
+            sample["disparity"] = np.ascontiguousarray(disparity)
+        if "depth" in sample:
+            depth = sample["depth"].astype(np.float32)
+            sample["depth"] = np.ascontiguousarray(depth)
+        return sample

scalelsd/ssl/backbones/dpt/vit.py ADDED Viewed

	@@ -0,0 +1,586 @@

+import torch
+import torch.nn as nn
+import timm
+import types
+import math
+import torch.nn.functional as F
+activations = {}
+def get_activation(name):
+    def hook(model, input, output):
+        activations[name] = output
+    return hook
+attention = {}
+def get_attention(name):
+    def hook(module, input, output):
+        x = input[0]
+        B, N, C = x.shape
+        qkv = (
+            module.qkv(x)
+            .reshape(B, N, 3, module.num_heads, C // module.num_heads)
+            .permute(2, 0, 3, 1, 4).contiguous()
+        )
+        q, k, v = (
+            qkv[0],
+            qkv[1],
+            qkv[2],
+        )  # make torchscript happy (cannot use tensor as tuple)
+        attn = (q @ k.transpose(-2, -1).contiguous()) * module.scale
+        attn = attn.softmax(dim=-1)  # [:,:,1,1:]
+        attention[name] = attn
+    return hook
+def get_mean_attention_map(attn, token, shape):
+    attn = attn[:, :, token, 1:]
+    attn = attn.unflatten(2, torch.Size([shape[2] // 16, shape[3] // 16])).float()
+    attn = torch.nn.functional.interpolate(
+        attn, size=shape[2:], mode="bicubic", align_corners=False
+    ).squeeze(0)
+    all_attn = torch.mean(attn, 0)
+    return all_attn
+class Slice(nn.Module):
+    def __init__(self, start_index=1):
+        super(Slice, self).__init__()
+        self.start_index = start_index
+    def forward(self, x):
+        return x[:, self.start_index :]
+class AddReadout(nn.Module):
+    def __init__(self, start_index=1):
+        super(AddReadout, self).__init__()
+        self.start_index = start_index
+    def forward(self, x):
+        if self.start_index == 2:
+            readout = (x[:, 0] + x[:, 1]) / 2
+        else:
+            readout = x[:, 0]
+        return x[:, self.start_index :] + readout.unsqueeze(1)
+class ProjectReadout(nn.Module):
+    def __init__(self, in_features, start_index=1):
+        super(ProjectReadout, self).__init__()
+        self.start_index = start_index
+        self.project = nn.Sequential(nn.Linear(2 * in_features, in_features), nn.GELU())
+    def forward(self, x):
+        readout = x[:, 0].unsqueeze(1).expand_as(x[:, self.start_index :])
+        features = torch.cat((x[:, self.start_index :], readout), -1)
+        return self.project(features)
+class Transpose(nn.Module):
+    def __init__(self, dim0, dim1):
+        super(Transpose, self).__init__()
+        self.dim0 = dim0
+        self.dim1 = dim1
+    def forward(self, x):
+        x = x.transpose(self.dim0, self.dim1).contiguous()
+        return x
+def forward_vit(pretrained, x):
+    b, c, h, w = x.shape
+    glob = pretrained.model.forward_flex(x)
+    layer_1 = pretrained.activations["1"]
+    layer_2 = pretrained.activations["2"]
+    layer_3 = pretrained.activations["3"]
+    layer_4 = pretrained.activations["4"]
+    layer_1 = pretrained.act_postprocess1[0:2](layer_1)
+    layer_2 = pretrained.act_postprocess2[0:2](layer_2)
+    layer_3 = pretrained.act_postprocess3[0:2](layer_3)
+    layer_4 = pretrained.act_postprocess4[0:2](layer_4)
+    unflatten = nn.Sequential(
+        nn.Unflatten(
+            2,
+            torch.Size(
+                [
+                    h // pretrained.model.patch_size[1],
+                    w // pretrained.model.patch_size[0],
+                ]
+            ),
+        )
+    )
+    if layer_1.ndim == 3:
+        layer_1 = unflatten(layer_1)
+    if layer_2.ndim == 3:
+        layer_2 = unflatten(layer_2)
+    if layer_3.ndim == 3:
+        layer_3 = unflatten(layer_3)
+    if layer_4.ndim == 3:
+        layer_4 = unflatten(layer_4)
+    layer_1 = pretrained.act_postprocess1[3 : len(pretrained.act_postprocess1)](layer_1)
+    layer_2 = pretrained.act_postprocess2[3 : len(pretrained.act_postprocess2)](layer_2)
+    layer_3 = pretrained.act_postprocess3[3 : len(pretrained.act_postprocess3)](layer_3)
+    layer_4 = pretrained.act_postprocess4[3 : len(pretrained.act_postprocess4)](layer_4)
+    return layer_1, layer_2, layer_3, layer_4
+def _resize_pos_embed(self, posemb, gs_h, gs_w):
+    posemb_tok, posemb_grid = (
+        posemb[:, : self.start_index],
+        posemb[0, self.start_index :],
+    )
+    gs_old = int(math.sqrt(len(posemb_grid)))
+    posemb_grid = posemb_grid.reshape(1, gs_old, gs_old, -1).permute(0, 3, 1, 2)
+    posemb_grid = F.interpolate(posemb_grid, size=(gs_h, gs_w), mode="bilinear")
+    posemb_grid = posemb_grid.permute(0, 2, 3, 1).reshape(1, gs_h * gs_w, -1)
+    posemb = torch.cat([posemb_tok, posemb_grid], dim=1)
+    return posemb
+def forward_flex(self, x):
+    b, c, h, w = x.shape
+    pos_embed = self._resize_pos_embed(
+        self.pos_embed, h // self.patch_size[1], w // self.patch_size[0]
+    )
+    B = x.shape[0]
+    if hasattr(self.patch_embed, "backbone"):
+        x = self.patch_embed.backbone(x)
+        if isinstance(x, (list, tuple)):
+            x = x[-1]  # last feature if backbone outputs list/tuple of features
+    x = self.patch_embed.proj(x).flatten(2).transpose(1, 2).contiguous()
+    if getattr(self, "dist_token", None) is not None:
+        cls_tokens = self.cls_token.expand(
+            B, -1, -1
+        )  # stole cls_tokens impl from Phil Wang, thanks
+        dist_token = self.dist_token.expand(B, -1, -1)
+        x = torch.cat((cls_tokens, dist_token, x), dim=1)
+    else:
+        cls_tokens = self.cls_token.expand(
+            B, -1, -1
+        )  # stole cls_tokens impl from Phil Wang, thanks
+        x = torch.cat((cls_tokens, x), dim=1)
+    x = x + pos_embed
+    x = self.pos_drop(x)
+    for blk in self.blocks:
+        x = blk(x)
+    x = self.norm(x)
+    return x
+def get_readout_oper(vit_features, features, use_readout, start_index=1):
+    if use_readout == "ignore":
+        readout_oper = [Slice(start_index)] * len(features)
+    elif use_readout == "add":
+        readout_oper = [AddReadout(start_index)] * len(features)
+    elif use_readout == "project":
+        readout_oper = [
+            ProjectReadout(vit_features, start_index) for out_feat in features
+        ]
+    else:
+        assert (
+            False
+        ), "wrong operation for readout token, use_readout can be 'ignore', 'add', or 'project'"
+    return readout_oper
+def _make_vit_b16_backbone(
+    model,
+    features=[96, 192, 384, 768],
+    size=[384, 384],
+    hooks=[2, 5, 8, 11],
+    vit_features=768,
+    use_readout="ignore",
+    start_index=1,
+    enable_attention_hooks=False,
+):
+    pretrained = nn.Module()
+    pretrained.model = model
+    pretrained.model.blocks[hooks[0]].register_forward_hook(get_activation("1"))
+    pretrained.model.blocks[hooks[1]].register_forward_hook(get_activation("2"))
+    pretrained.model.blocks[hooks[2]].register_forward_hook(get_activation("3"))
+    pretrained.model.blocks[hooks[3]].register_forward_hook(get_activation("4"))
+    pretrained.activations = activations
+    if enable_attention_hooks:
+        pretrained.model.blocks[hooks[0]].attn.register_forward_hook(
+            get_attention("attn_1")
+        )
+        pretrained.model.blocks[hooks[1]].attn.register_forward_hook(
+            get_attention("attn_2")
+        )
+        pretrained.model.blocks[hooks[2]].attn.register_forward_hook(
+            get_attention("attn_3")
+        )
+        pretrained.model.blocks[hooks[3]].attn.register_forward_hook(
+            get_attention("attn_4")
+        )
+        pretrained.attention = attention
+    readout_oper = get_readout_oper(vit_features, features, use_readout, start_index)
+    # 32, 48, 136, 384
+    pretrained.act_postprocess1 = nn.Sequential(
+        readout_oper[0],
+        Transpose(1, 2),
+        nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+        nn.Conv2d(
+            in_channels=vit_features,
+            out_channels=features[0],
+            kernel_size=1,
+            stride=1,
+            padding=0,
+        ),
+        nn.ConvTranspose2d(
+            in_channels=features[0],
+            out_channels=features[0],
+            kernel_size=4,
+            stride=4,
+            padding=0,
+            bias=True,
+            dilation=1,
+            groups=1,
+        ),
+    )
+    pretrained.act_postprocess2 = nn.Sequential(
+        readout_oper[1],
+        Transpose(1, 2),
+        nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+        nn.Conv2d(
+            in_channels=vit_features,
+            out_channels=features[1],
+            kernel_size=1,
+            stride=1,
+            padding=0,
+        ),
+        nn.ConvTranspose2d(
+            in_channels=features[1],
+            out_channels=features[1],
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            bias=True,
+            dilation=1,
+            groups=1,
+        ),
+    )
+    pretrained.act_postprocess3 = nn.Sequential(
+        readout_oper[2],
+        Transpose(1, 2),
+        nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+        nn.Conv2d(
+            in_channels=vit_features,
+            out_channels=features[2],
+            kernel_size=1,
+            stride=1,
+            padding=0,
+        ),
+    )
+    pretrained.act_postprocess4 = nn.Sequential(
+        readout_oper[3],
+        Transpose(1, 2),
+        nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+        nn.Conv2d(
+            in_channels=vit_features,
+            out_channels=features[3],
+            kernel_size=1,
+            stride=1,
+            padding=0,
+        ),
+        nn.Conv2d(
+            in_channels=features[3],
+            out_channels=features[3],
+            kernel_size=3,
+            stride=2,
+            padding=1,
+        ),
+    )
+    pretrained.model.start_index = start_index
+    pretrained.model.patch_size = [16, 16]
+    # We inject this function into the VisionTransformer instances so that
+    # we can use it with interpolated position embeddings without modifying the library source.
+    pretrained.model.forward_flex = types.MethodType(forward_flex, pretrained.model)
+    pretrained.model._resize_pos_embed = types.MethodType(
+        _resize_pos_embed, pretrained.model
+    )
+    return pretrained
+def _make_vit_b_rn50_backbone(
+    model,
+    features=[256, 512, 768, 768],
+    size=[384, 384],
+    hooks=[0, 1, 8, 11],
+    vit_features=768,
+    use_vit_only=False,
+    use_readout="ignore",
+    start_index=1,
+    enable_attention_hooks=False,
+    use_layer_scale=False,
+):
+    pretrained = nn.Module()
+    ###
+    if use_layer_scale:
+        from timm.models.vision_transformer import LayerScale
+        for i, block in enumerate (model.blocks) :
+            block.ls1 = LayerScale(vit_features)
+            block.ls2 = LayerScale(vit_features)
+    pretrained.model = model
+    if use_vit_only == True:
+        pretrained.model.blocks[hooks[0]].register_forward_hook(get_activation("1"))
+        pretrained.model.blocks[hooks[1]].register_forward_hook(get_activation("2"))
+    else:
+        pretrained.model.patch_embed.backbone.stages[0].register_forward_hook(
+            get_activation("1")
+        )
+        pretrained.model.patch_embed.backbone.stages[1].register_forward_hook(
+            get_activation("2")
+        )
+    pretrained.model.blocks[hooks[2]].register_forward_hook(get_activation("3"))
+    pretrained.model.blocks[hooks[3]].register_forward_hook(get_activation("4"))
+    if enable_attention_hooks:
+        pretrained.model.blocks[2].attn.register_forward_hook(get_attention("attn_1"))
+        pretrained.model.blocks[5].attn.register_forward_hook(get_attention("attn_2"))
+        pretrained.model.blocks[8].attn.register_forward_hook(get_attention("attn_3"))
+        pretrained.model.blocks[11].attn.register_forward_hook(get_attention("attn_4"))
+        pretrained.attention = attention
+    pretrained.activations = activations
+    readout_oper = get_readout_oper(vit_features, features, use_readout, start_index)
+    if use_vit_only == True:
+        pretrained.act_postprocess1 = nn.Sequential(
+            readout_oper[0],
+            Transpose(1, 2),
+            nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+            nn.Conv2d(
+                in_channels=vit_features,
+                out_channels=features[0],
+                kernel_size=1,
+                stride=1,
+                padding=0,
+            ),
+            nn.ConvTranspose2d(
+                in_channels=features[0],
+                out_channels=features[0],
+                kernel_size=4,
+                stride=4,
+                padding=0,
+                bias=True,
+                dilation=1,
+                groups=1,
+            ),
+        )
+        pretrained.act_postprocess2 = nn.Sequential(
+            readout_oper[1],
+            Transpose(1, 2),
+            nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+            nn.Conv2d(
+                in_channels=vit_features,
+                out_channels=features[1],
+                kernel_size=1,
+                stride=1,
+                padding=0,
+            ),
+            nn.ConvTranspose2d(
+                in_channels=features[1],
+                out_channels=features[1],
+                kernel_size=2,
+                stride=2,
+                padding=0,
+                bias=True,
+                dilation=1,
+                groups=1,
+            ),
+        )
+    else:
+        pretrained.act_postprocess1 = nn.Sequential(
+            nn.Identity(), nn.Identity(), nn.Identity()
+        )
+        pretrained.act_postprocess2 = nn.Sequential(
+            nn.Identity(), nn.Identity(), nn.Identity()
+        )
+    pretrained.act_postprocess3 = nn.Sequential(
+        readout_oper[2],
+        Transpose(1, 2),
+        nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+        nn.Conv2d(
+            in_channels=vit_features,
+            out_channels=features[2],
+            kernel_size=1,
+            stride=1,
+            padding=0,
+        ),
+    )
+    pretrained.act_postprocess4 = nn.Sequential(
+        readout_oper[3],
+        Transpose(1, 2),
+        nn.Unflatten(2, torch.Size([size[0] // 16, size[1] // 16])),
+        nn.Conv2d(
+            in_channels=vit_features,
+            out_channels=features[3],
+            kernel_size=1,
+            stride=1,
+            padding=0,
+        ),
+        nn.Conv2d(
+            in_channels=features[3],
+            out_channels=features[3],
+            kernel_size=3,
+            stride=2,
+            padding=1,
+        ),
+    )
+    pretrained.model.start_index = start_index
+    pretrained.model.patch_size = [16, 16]
+    # We inject this function into the VisionTransformer instances so that
+    # we can use it with interpolated position embeddings without modifying the library source.
+    pretrained.model.forward_flex = types.MethodType(forward_flex, pretrained.model)
+    # We inject this function into the VisionTransformer instances so that
+    # we can use it with interpolated position embeddings without modifying the library source.
+    pretrained.model._resize_pos_embed = types.MethodType(
+        _resize_pos_embed, pretrained.model
+    )
+    return pretrained
+def _make_pretrained_vitb_rn50_384(
+    pretrained,
+    use_readout="ignore",
+    hooks=None,
+    use_vit_only=False,
+    enable_attention_hooks=False,
+    use_layer_scale=False,
+):
+    model = timm.create_model("vit_base_resnet50_384", pretrained=pretrained)
+    hooks = [0, 1, 8, 11] if hooks == None else hooks
+    return _make_vit_b_rn50_backbone(
+        model,
+        features=[256, 512, 768, 768],
+        size=[384, 384],
+        hooks=hooks,
+        use_vit_only=use_vit_only,
+        use_readout=use_readout,
+        enable_attention_hooks=enable_attention_hooks,
+        use_layer_scale=use_layer_scale,
+    )
+def _make_pretrained_vitl16_384(
+    pretrained, use_readout="ignore", hooks=None, enable_attention_hooks=False
+):
+    model = timm.create_model("vit_large_patch16_384", pretrained=pretrained)
+    hooks = [5, 11, 17, 23] if hooks == None else hooks
+    return _make_vit_b16_backbone(
+        model,
+        features=[256, 512, 1024, 1024],
+        hooks=hooks,
+        vit_features=1024,
+        use_readout=use_readout,
+        enable_attention_hooks=enable_attention_hooks,
+    )
+def _make_pretrained_vitb16_384(
+    pretrained, use_readout="ignore", hooks=None, enable_attention_hooks=False
+):
+    model = timm.create_model("vit_base_patch16_384", pretrained=pretrained)
+    hooks = [2, 5, 8, 11] if hooks == None else hooks
+    return _make_vit_b16_backbone(
+        model,
+        features=[96, 192, 384, 768],
+        hooks=hooks,
+        use_readout=use_readout,
+        enable_attention_hooks=enable_attention_hooks,
+    )
+def _make_pretrained_deitb16_384(
+    pretrained, use_readout="ignore", hooks=None, enable_attention_hooks=False
+):
+    model = timm.create_model("vit_deit_base_patch16_384", pretrained=pretrained)
+    hooks = [2, 5, 8, 11] if hooks == None else hooks
+    return _make_vit_b16_backbone(
+        model,
+        features=[96, 192, 384, 768],
+        hooks=hooks,
+        use_readout=use_readout,
+        enable_attention_hooks=enable_attention_hooks,
+    )
+def _make_pretrained_deitb16_distil_384(
+    pretrained, use_readout="ignore", hooks=None, enable_attention_hooks=False
+):
+    model = timm.create_model(
+        "vit_deit_base_distilled_patch16_384", pretrained=pretrained
+    )
+    hooks = [2, 5, 8, 11] if hooks == None else hooks
+    return _make_vit_b16_backbone(
+        model,
+        features=[96, 192, 384, 768],
+        hooks=hooks,
+        use_readout=use_readout,
+        start_index=2,
+        enable_attention_hooks=enable_attention_hooks,
+    )

scalelsd/ssl/backbones/multi_task_head.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import torch
+import torch.nn as nn
+class MultitaskHead(nn.Module):
+    def __init__(self, input_channels, num_class, head_size):
+        super(MultitaskHead, self).__init__()
+        m = int(input_channels / 4)
+        heads = []
+        for output_channels in sum(head_size, []):
+            heads.append(
+                nn.Sequential(
+                    nn.Conv2d(input_channels, m, kernel_size=3, padding=1),
+                    nn.ReLU(inplace=True),
+                    nn.Conv2d(m, output_channels, kernel_size=1),
+                )
+            )
+        self.heads = nn.ModuleList(heads)
+        assert num_class == sum(sum(head_size, []))
+    def forward(self, x):
+        # import pdb;pdb.set_trace()
+        return torch.cat([head(x) for head in self.heads], dim=1)
+class AngleDistanceHead(nn.Module):
+    def __init__(self, input_channels, num_class, head_size):
+        super(AngleDistanceHead, self).__init__()
+        m = int(input_channels/4)
+        heads = []
+        for output_channels in sum(head_size, []):
+            if output_channels != 2:
+                heads.append(
+                    nn.Sequential(
+                        nn.Conv2d(input_channels, m, kernel_size=3, padding=1),
+                        nn.ReLU(inplace=True),
+                        nn.Conv2d(m, output_channels, kernel_size=1),
+                    )
+                )
+            else:
+                heads.append(
+                    nn.Sequential(
+                        nn.Conv2d(input_channels, m, kernel_size=3, padding=1),
+                        nn.ReLU(inplace=True),
+                        CosineSineLayer(m)
+                    )
+                )
+        self.heads = nn.ModuleList(heads)
+        assert num_class == sum(sum(head_size, []))
+    def forward(self, x):
+        return torch.cat([head(x) for head in self.heads], dim=1)

scalelsd/ssl/config/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ from .project_config import Config
2	+ from .utils import *

scalelsd/ssl/config/dataset/hpatches_dataset.yaml ADDED Viewed

	@@ -0,0 +1,105 @@

+### General dataset parameters
+dataset_name: "hpatches"
+add_augmentation_to_all_splits: False
+gray_scale: True
+# Ground truth source ('official' or path to the exported h5 dataset.)
+# gt_source_train: ""  # Fill with your own export file
+# gt_source_test: ""  # Fill with your own export file
+# Return type: (1) single (to train the detector only)
+# or (2) paired_desc (to train the detector + descriptor)
+return_type: "single"
+random_seed: 0
+### Descriptor training parameters
+# Number of points extracted per line
+max_num_samples: 10
+# Max number of training line points extracted in the whole image
+max_pts: 1000
+# Min distance between two points on a line (in  pixels)
+min_dist_pts: 10
+# Small jittering of the sampled points during training
+jittering: 0
+alteration: "all"
+max_side: 1200
+### Data preprocessing configuration
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    random_scaling:
+        enable: True
+        range: [0.7, 1.5]
+    photometric:
+        enable: true
+        primitives: ['random_brightness', 'random_contrast',
+                     'additive_speckle_noise', 'additive_gaussian_noise',
+                     'additive_shade', 'motion_blur' ]
+        params:
+            random_brightness: {brightness: 0.2}
+            random_contrast: {contrast: [0.3, 1.5]}
+            additive_gaussian_noise: {stddev_range: [0, 10]}
+            additive_speckle_noise: {prob_range: [0, 0.0035]}
+            additive_shade:
+                transparency_range: [-0.5, 0.5]
+                kernel_size_range: [100, 150]
+            motion_blur: {max_kernel_size: 3}
+        random_order: True
+    homographic:
+        enable: true
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.85
+            max_angle: 1.57
+            allow_artifacts: true
+        valid_border_margin: 3
+## Homography adaptation configuration
+homography_adaptation:
+    num_iter: 10
+    valid_border_margin: 3
+    min_counts: 3
+    homographies:
+        translation: true
+        rotation: true
+        scaling: true
+        perspective: true
+        scaling_amplitude: 0.2
+        perspective_amplitude_x: 0.2
+        perspective_amplitude_y: 0.2
+        allow_artifacts: true
+        patch_ratio: 0.85
+data:
+    name: hpatches
+    dataset_dir: HPatches_sequences
+    alteration: all
+    max_side: 1200
+    batch_size: 1
+    num_workers: 4
+model:
+    name: deeplsd
+    tiny: False
+    sharpen: True
+    line_neighborhood: 5
+    loss_weights:
+        df: 1.
+        angle: 1.
+    detect_lines: True
+    multiscale: False
+    scale_factors: [1., 1.5]
+    line_detection_params:
+        grad_nfa: True
+        merge: False
+        optimize: False
+        use_vps: False
+        optimize_vps: False
+        filtering: True
+        grad_thresh: 3

scalelsd/ssl/config/dataset/nyu_dataset.yaml ADDED Viewed

	@@ -0,0 +1,77 @@

+### General dataset parameters
+dataset_name: "nyu"
+add_augmentation_to_all_splits: False
+gray_scale: True
+# Ground truth source ('official' or path to the exported h5 dataset.)
+# gt_source_train: ""  # Fill with your own export file
+# gt_source_test: ""  # Fill with your own export file
+# Return type: (1) single (to train the detector only)
+# or (2) paired_desc (to train the detector + descriptor)
+return_type: "single"
+random_seed: 0
+val_size: 49
+### Descriptor training parameters
+# Number of points extracted per line
+max_num_samples: 10
+# Max number of training line points extracted in the whole image
+max_pts: 1000
+# Min distance between two points on a line (in  pixels)
+min_dist_pts: 10
+# Small jittering of the sampled points during training
+jittering: 0
+### Data preprocessing configuration
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    random_scaling:
+        enable: True
+        range: [0.7, 1.5]
+    photometric:
+        enable: true
+        primitives: ['random_brightness', 'random_contrast',
+                     'additive_speckle_noise', 'additive_gaussian_noise',
+                     'additive_shade', 'motion_blur' ]
+        params:
+            random_brightness: {brightness: 0.2}
+            random_contrast: {contrast: [0.3, 1.5]}
+            additive_gaussian_noise: {stddev_range: [0, 10]}
+            additive_speckle_noise: {prob_range: [0, 0.0035]}
+            additive_shade:
+                transparency_range: [-0.5, 0.5]
+                kernel_size_range: [100, 150]
+            motion_blur: {max_kernel_size: 3}
+        random_order: True
+    homographic:
+        enable: true
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.85
+            max_angle: 1.57
+            allow_artifacts: true
+        valid_border_margin: 3
+## Homography adaptation configuration
+homography_adaptation:
+    num_iter: 10
+    valid_border_margin: 3
+    min_counts: 3
+    homographies:
+        translation: true
+        rotation: true
+        scaling: true
+        perspective: true
+        scaling_amplitude: 0.2
+        perspective_amplitude_x: 0.2
+        perspective_amplitude_y: 0.2
+        allow_artifacts: true
+        patch_ratio: 0.85

scalelsd/ssl/config/dataset/official_yorkurban_dataset.yaml ADDED Viewed

	@@ -0,0 +1,75 @@

+### General dataset parameters
+dataset_name: "official_yorkurban"
+add_augmentation_to_all_splits: False
+gray_scale: True
+# Ground truth source ('official' or path to the exported h5 dataset.)
+# gt_source_train: ""  # Fill with your own export file
+# gt_source_test: ""  # Fill with your own export file
+# Return type: (1) single (to train the detector only)
+# or (2) paired_desc (to train the detector + descriptor)
+return_type: "single"
+random_seed: 0
+### Descriptor training parameters
+# Number of points extracted per line
+max_num_samples: 10
+# Max number of training line points extracted in the whole image
+max_pts: 1000
+# Min distance between two points on a line (in  pixels)
+min_dist_pts: 10
+# Small jittering of the sampled points during training
+jittering: 0
+### Data preprocessing configuration
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    random_scaling:
+        enable: True
+        range: [0.7, 1.5]
+    photometric:
+        enable: true
+        primitives: ['random_brightness', 'random_contrast',
+                     'additive_speckle_noise', 'additive_gaussian_noise',
+                     'additive_shade', 'motion_blur' ]
+        params:
+            random_brightness: {brightness: 0.2}
+            random_contrast: {contrast: [0.3, 1.5]}
+            additive_gaussian_noise: {stddev_range: [0, 10]}
+            additive_speckle_noise: {prob_range: [0, 0.0035]}
+            additive_shade:
+                transparency_range: [-0.5, 0.5]
+                kernel_size_range: [100, 150]
+            motion_blur: {max_kernel_size: 3}
+        random_order: True
+    homographic:
+        enable: true
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.85
+            max_angle: 1.57
+            allow_artifacts: true
+        valid_border_margin: 3
+## Homography adaptation configuration
+homography_adaptation:
+    num_iter: 10
+    valid_border_margin: 3
+    min_counts: 3
+    homographies:
+        translation: true
+        rotation: true
+        scaling: true
+        perspective: true
+        scaling_amplitude: 0.2
+        perspective_amplitude_x: 0.2
+        perspective_amplitude_y: 0.2
+        allow_artifacts: true
+        patch_ratio: 0.85

scalelsd/ssl/config/dataset/rdnim_dataset.yaml ADDED Viewed

	@@ -0,0 +1,77 @@

+### General dataset parameters
+dataset_name: "rdnim"
+add_augmentation_to_all_splits: False
+gray_scale: True
+# Ground truth source ('official' or path to the exported h5 dataset.)
+# gt_source_train: ""  # Fill with your own export file
+# gt_source_test: ""  # Fill with your own export file
+# Return type: (1) single (to train the detector only)
+# or (2) paired_desc (to train the detector + descriptor)
+return_type: "single"
+random_seed: 0
+### Descriptor training parameters
+# Number of points extracted per line
+max_num_samples: 10
+# Max number of training line points extracted in the whole image
+max_pts: 1000
+# Min distance between two points on a line (in  pixels)
+min_dist_pts: 10
+# Small jittering of the sampled points during training
+jittering: 0
+reference: "night"
+### Data preprocessing configuration
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    random_scaling:
+        enable: True
+        range: [0.7, 1.5]
+    photometric:
+        enable: true
+        primitives: ['random_brightness', 'random_contrast',
+                     'additive_speckle_noise', 'additive_gaussian_noise',
+                     'additive_shade', 'motion_blur' ]
+        params:
+            random_brightness: {brightness: 0.2}
+            random_contrast: {contrast: [0.3, 1.5]}
+            additive_gaussian_noise: {stddev_range: [0, 10]}
+            additive_speckle_noise: {prob_range: [0, 0.0035]}
+            additive_shade:
+                transparency_range: [-0.5, 0.5]
+                kernel_size_range: [100, 150]
+            motion_blur: {max_kernel_size: 3}
+        random_order: True
+    homographic:
+        enable: true
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.85
+            max_angle: 1.57
+            allow_artifacts: true
+        valid_border_margin: 3
+## Homography adaptation configuration
+homography_adaptation:
+    num_iter: 10
+    valid_border_margin: 3
+    min_counts: 3
+    homographies:
+        translation: true
+        rotation: true
+        scaling: true
+        perspective: true
+        scaling_amplitude: 0.2
+        perspective_amplitude_x: 0.2
+        perspective_amplitude_y: 0.2
+        allow_artifacts: true
+        patch_ratio: 0.85

scalelsd/ssl/config/dataset/synthetic_dataset-1024.yaml ADDED Viewed

	@@ -0,0 +1,49 @@

+### General dataset parameters
+dataset_name: "synthetic_shape"
+primitives: "all"
+add_augmentation_to_all_splits: True
+test_augmentation_seed: 200
+# Shape generation configuration
+generation:
+    # split_sizes: {'train': 20000, 'val': 2000, 'test': 400}
+    split_sizes: {'train': 2000, 'val': 2000, 'test': 400}
+    random_seed: 10
+    image_size: [960, 1280]
+    min_len: 0.0985
+    min_label_len: 0.099
+    params:
+        generate_background:
+            min_kernel_size: 150
+            max_kernel_size: 500
+            min_rad_ratio: 0.02
+            max_rad_ratio: 0.031
+        draw_stripes:
+            transform_params: [0.1, 0.1]
+        draw_multiple_polygons:
+            kernel_boundaries: [50, 100]
+### Data preprocessing configuration.
+preprocessing:
+    resize: [1024, 1024]
+    blur_size: 11
+augmentation:
+    photometric:
+        enable: True
+        primitives: 'all'
+        params: {}
+        random_order: True
+    homographic:
+        enable: True
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.8
+            max_angle: 1.57
+            allow_artifacts: true
+            translation_overflow: 0.05
+        valid_border_margin: 0

scalelsd/ssl/config/dataset/synthetic_dataset-2k.yaml ADDED Viewed

	@@ -0,0 +1,50 @@

+### General dataset parameters
+dataset_name: "synthetic_shape"
+primitives: "all"
+add_augmentation_to_all_splits: True
+test_augmentation_seed: 200
+alias: 2k
+# Shape generation configuration
+generation:
+    # split_sizes: {'train': 20000, 'val': 2000, 'test': 400}
+    split_sizes: {'train': 2000, 'val': 200, 'test': 400}
+    random_seed: 10
+    image_size: [960, 1280]
+    min_len: 0.0985
+    min_label_len: 0.099
+    params:
+        generate_background:
+            min_kernel_size: 150
+            max_kernel_size: 500
+            min_rad_ratio: 0.02
+            max_rad_ratio: 0.031
+        draw_stripes:
+            transform_params: [0.1, 0.1]
+        draw_multiple_polygons:
+            kernel_boundaries: [50, 100]
+### Data preprocessing configuration.
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    photometric:
+        enable: True
+        primitives: 'all'
+        params: {}
+        random_order: True
+    homographic:
+        enable: True
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.8
+            max_angle: 1.57
+            allow_artifacts: true
+            translation_overflow: 0.05
+        valid_border_margin: 0

scalelsd/ssl/config/dataset/synthetic_dataset-4k.yaml ADDED Viewed

	@@ -0,0 +1,50 @@

+### General dataset parameters
+dataset_name: "synthetic_shape"
+primitives: "all"
+add_augmentation_to_all_splits: True
+test_augmentation_seed: 200
+alias: 4k
+# Shape generation configuration
+generation:
+    # split_sizes: {'train': 20000, 'val': 2000, 'test': 400}
+    split_sizes: {'train': 4000, 'val': 2000, 'test': 400}
+    random_seed: 10
+    image_size: [960, 1280]
+    min_len: 0.0985
+    min_label_len: 0.099
+    params:
+        generate_background:
+            min_kernel_size: 150
+            max_kernel_size: 500
+            min_rad_ratio: 0.02
+            max_rad_ratio: 0.031
+        draw_stripes:
+            transform_params: [0.1, 0.1]
+        draw_multiple_polygons:
+            kernel_boundaries: [50, 100]
+### Data preprocessing configuration.
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    photometric:
+        enable: True
+        primitives: 'all'
+        params: {}
+        random_order: True
+    homographic:
+        enable: True
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.8
+            max_angle: 1.57
+            allow_artifacts: true
+            translation_overflow: 0.05
+        valid_border_margin: 0

scalelsd/ssl/config/dataset/synthetic_dataset-large.yaml ADDED Viewed

	@@ -0,0 +1,50 @@

+### General dataset parameters
+dataset_name: "synthetic_shape"
+primitives: "all"
+add_augmentation_to_all_splits: True
+test_augmentation_seed: 200
+alias: "synthetic_shape_large"
+# Shape generation configuration
+generation:
+    split_sizes: {'train': 20000, 'val': 2000, 'test': 400}
+    # split_sizes: {'train': 2000, 'val': 2000, 'test': 400}
+    random_seed: 10
+    image_size: [960, 1280]
+    min_len: 0.0985
+    min_label_len: 0.099
+    params:
+        generate_background:
+            min_kernel_size: 150
+            max_kernel_size: 500
+            min_rad_ratio: 0.02
+            max_rad_ratio: 0.031
+        draw_stripes:
+            transform_params: [0.1, 0.1]
+        draw_multiple_polygons:
+            kernel_boundaries: [50, 100]
+### Data preprocessing configuration.
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    photometric:
+        enable: True
+        primitives: 'all'
+        params: {}
+        random_order: True
+    homographic:
+        enable: True
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.8
+            max_angle: 1.57
+            allow_artifacts: true
+            translation_overflow: 0.05
+        valid_border_margin: 0

scalelsd/ssl/config/dataset/synthetic_dataset.yaml ADDED Viewed

	@@ -0,0 +1,51 @@

+### General dataset parameters
+dataset_name: "synthetic_shape"
+primitives: "all"
+add_augmentation_to_all_splits: True
+test_augmentation_seed: 200
+# Shape generation configuration
+generation:
+    # split_sizes: {'train': 20000, 'val': 2000, 'test': 400}
+    # split_sizes: {'train': 2000, 'val': 2000, 'test': 400}
+    split_sizes: {'train': 100, 'val': 100, 'test': 100}
+    random_seed: 10
+    # image_size: [960, 1280]
+    image_size: [1024, 1024]
+    min_len: 0.0985
+    min_label_len: 0.099
+    params:
+        generate_background:
+            min_kernel_size: 150
+            max_kernel_size: 500
+            min_rad_ratio: 0.02
+            max_rad_ratio: 0.031
+        draw_stripes:
+            transform_params: [0.1, 0.1]
+        draw_multiple_polygons:
+            kernel_boundaries: [50, 100]
+### Data preprocessing configuration.
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    photometric:
+        enable: True
+        primitives: 'all'
+        params: {}
+        random_order: True
+    homographic:
+        enable: True
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.8
+            max_angle: 1.57
+            allow_artifacts: true
+            translation_overflow: 0.05
+        valid_border_margin: 0

scalelsd/ssl/config/dataset/wireframe_official_gt copy.yaml ADDED Viewed

	@@ -0,0 +1,86 @@

+dataset_name: "wireframe"
+add_augmentation_to_all_splits: False
+gray_scale: True
+# return_type: "paired_desc"
+random_seed: 0
+# Ground truth source (official or path to the epxorted h5 dataset.)
+gt_source_train: "official"
+gt_source_test: "official"
+# Date preprocessing configuration.
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    random_scaling:
+        enable: True
+        range: [0.7, 1.5]
+    photometric:
+        enable: true
+        primitives: ['random_brightness', 'random_contrast',
+                     'additive_speckle_noise', 'additive_gaussian_noise',
+                     'additive_shade', 'motion_blur' ]
+        params:
+            random_brightness: {brightness: 0.2}
+            random_contrast: {contrast: [0.3, 1.5]}
+            additive_gaussian_noise: {stddev_range: [0, 10]}
+            additive_speckle_noise: {prob_range: [0, 0.0035]}
+            additive_shade:
+                transparency_range: [-0.5, 0.5]
+                kernel_size_range: [100, 150]
+            motion_blur: {max_kernel_size: 3}
+        random_order: True
+    homographic:
+        enable: true
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.85
+            max_angle: 1.57
+            allow_artifacts: true
+        valid_border_margin: 3
+# The homography adaptation configuration
+homography_adaptation:
+    num_iter: 100
+    aggregation: 'sum'
+    mode: 'ver1'
+    valid_border_margin: 3
+    min_counts: 30
+    homographies:
+        translation: true
+        rotation: true
+        scaling: true
+        perspective: true
+        scaling_amplitude: 0.2
+        perspective_amplitude_x: 0.2
+        perspective_amplitude_y: 0.2
+        allow_artifacts: true
+        patch_ratio: 0.85
+# Evaluation related config
+evaluation:
+    repeatability:
+        # Initial random seed used to sample homographic augmentation
+        seed: 200
+        # Parameter used to sample illumination change evaluation set.
+        photometric:
+            enable: False
+        # Parameter used to sample viewpoint change evaluation set.
+        homographic:
+            enable: True
+            num_samples: 2
+            params:
+                translation: true
+                rotation: true
+                scaling: true
+                perspective: true
+                scaling_amplitude: 0.2
+                perspective_amplitude_x: 0.2
+                perspective_amplitude_y: 0.2
+                patch_ratio: 0.85
+                max_angle: 1.57
+                allow_artifacts: true
+            valid_border_margin: 3

scalelsd/ssl/config/dataset/wireframe_official_gt.yaml ADDED Viewed

	@@ -0,0 +1,86 @@

+dataset_name: "wireframe"
+add_augmentation_to_all_splits: False
+gray_scale: True
+# return_type: "paired_desc"
+random_seed: 0
+# Ground truth source (official or path to the epxorted h5 dataset.)
+gt_source_train: "official"
+gt_source_test: "official"
+# Date preprocessing configuration.
+preprocessing:
+    resize: [512, 512]
+    blur_size: 11
+augmentation:
+    random_scaling:
+        enable: True
+        range: [0.7, 1.5]
+    photometric:
+        enable: true
+        primitives: ['random_brightness', 'random_contrast',
+                     'additive_speckle_noise', 'additive_gaussian_noise',
+                     'additive_shade', 'motion_blur' ]
+        params:
+            random_brightness: {brightness: 0.2}
+            random_contrast: {contrast: [0.3, 1.5]}
+            additive_gaussian_noise: {stddev_range: [0, 10]}
+            additive_speckle_noise: {prob_range: [0, 0.0035]}
+            additive_shade:
+                transparency_range: [-0.5, 0.5]
+                kernel_size_range: [100, 150]
+            motion_blur: {max_kernel_size: 3}
+        random_order: True
+    homographic:
+        enable: true
+        params:
+            translation: true
+            rotation: true
+            scaling: true
+            perspective: true
+            scaling_amplitude: 0.2
+            perspective_amplitude_x: 0.2
+            perspective_amplitude_y: 0.2
+            patch_ratio: 0.85
+            max_angle: 1.57
+            allow_artifacts: true
+        valid_border_margin: 3
+# The homography adaptation configuration
+homography_adaptation:
+    num_iter: 100
+    aggregation: 'sum'
+    mode: 'ver1'
+    valid_border_margin: 3
+    min_counts: 30
+    homographies:
+        translation: true
+        rotation: true
+        scaling: true
+        perspective: true
+        scaling_amplitude: 0.2
+        perspective_amplitude_x: 0.2
+        perspective_amplitude_y: 0.2
+        allow_artifacts: true
+        patch_ratio: 0.85
+# Evaluation related config
+evaluation:
+    repeatability:
+        # Initial random seed used to sample homographic augmentation
+        seed: 200
+        # Parameter used to sample illumination change evaluation set.
+        photometric:
+            enable: False
+        # Parameter used to sample viewpoint change evaluation set.
+        homographic:
+            enable: True
+            num_samples: 2
+            params:
+                translation: true
+                rotation: true
+                scaling: true
+                perspective: true
+                scaling_amplitude: 0.2
+                perspective_amplitude_x: 0.2
+                perspective_amplitude_y: 0.2
+                patch_ratio: 0.85
+                max_angle: 1.57
+                allow_artifacts: true
+            valid_border_margin: 3