Google Summer of Code: Adding RAFT Optical Flow Model using ONNX Format (#197)

* RAFT ONNX GSoC

* use only opencv instead of onnx

* correct typo in help display message of demo.py

* add video functionality

* Add some clarity to README.md

Files changed (5) hide show

BSD-3-LICENSE.txt +29 -0
MITLICENSE.txt +21 -0
README.md +68 -0
demo.py +310 -0
raft.py +53 -0

BSD-3-LICENSE.txt ADDED Viewed

	@@ -0,0 +1,29 @@

+BSD 3-Clause License
+Copyright (c) 2020, princeton-vl
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

MITLICENSE.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2021 Jeong-gi Kwak
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# RAFT
+This model is originally created by Zachary Teed and Jia Deng of Princeton University. The source code for the model is at [their repository on GitHub](https://github.com/princeton-vl/RAFT), and the original [research paper](https://arxiv.org/abs/2003.12039) is published on [Arxiv](https://arxiv.org/abs/2003.12039). The model was converted to ONNX by [PINTO0309](https://github.com/PINTO0309) in his [model zoo](https://github.com/PINTO0309/PINTO_model_zoo/tree/main/252_RAFT). The ONNX model has several variations depending on the training dataset and input dimesnions. The model used in this demo is trained on Sintel dataset with input size of 360 $\times$ 480.
+## Demo
+Run any of the following commands to try the demo:
+```shell
+# run on camera input
+python demo.py
+# run on two images and visualize result
+python demo.py --input1 /path/to/image1 --input2 /path/to/image2 -vis
+# run on two images and save result
+python demo.py --input1 /path/to/image1 --input2 /path/to/image2 -s
+# run on two images and both save and visualize result
+python demo.py --input1 /path/to/image1 --input2 /path/to/image2 -s -vis
+# run on one video and visualize result
+python demo.py --video /path/to/video -vis
+# run on one video and save result
+python demo.py --video /path/to/video -s
+# run on one video and both save and visualize result
+python demo.py --video /path/to/video -s -vis
+# get help regarding various parameters
+python demo.py --help
+```
+While running on video, you can press q anytime to stop. The model demo runs on camera input, video input, or takes two images to compute optical flow across frames. The save and vis arguments of the shell command are only valid in the case of using video or two images as input. To run a different variation of the model, such as a model trained on a different dataset or with a different input size, refer to [RAFT ONNX in PINTO Model Zoo](https://github.com/PINTO0309/PINTO_model_zoo/tree/main/252_RAFT) to download your chosen model. And if your chosen model has different input shape from 360 $\times$ 480, **change the input shape in raft.py line 15 to the new input shape**. Then, add the model path to the --model argument of the shell command, such as in the following example commands:
+```shell
+# run on camera input
+python demo.py --model /path/to/model
+# run on two images
+python demo.py --input1 /path/to/image1 --input2 /path/to/image2 --model /path/to/model
+# run on video
+python demo.py --video /path/to/video  --model /path/to/model
+```
+### Example outputs
+The visualization argument displays both image inputs as well as out result.
+![Visualization example](./example_outputs/vis.png)
+The save argument saves the result only.
+![Output example](./example_outputs/result.jpg)
+## License
+The original RAFT model is under [BSD-3-Clause license](./BSD-3-LICENSE.txt). <br />
+The conversion of the RAFT model to the ONNX format by [PINTO0309](https://github.com/PINTO0309/PINTO_model_zoo/tree/main/252_RAFT) is under [MIT License](./MITLICENSE.txt). <br />
+Some of the code in demo.py and raft.py is adapted from [ibaiGorordo's repository](https://github.com/ibaiGorordo/ONNX-RAFT-Optical-Flow-Estimation/tree/main) under [BSD-3-Clause license](./BSD-3-LICENSE.txt).<br />
+## Reference
+- https://arxiv.org/abs/2003.12039
+- https://github.com/princeton-vl/RAFT
+- https://github.com/ibaiGorordo/ONNX-RAFT-Optical-Flow-Estimation/tree/main
+- https://github.com/PINTO0309/PINTO_model_zoo/tree/main/252_RAFT

demo.py ADDED Viewed

	@@ -0,0 +1,310 @@

+import argparse
+import cv2 as cv
+import numpy as np
+from raft import Raft
+parser = argparse.ArgumentParser(description='RAFT (https://github.com/princeton-vl/RAFT)')
+parser.add_argument('--input1', '-i1', type=str,
+                    help='Usage: Set input1 path to first image, omit if using camera or video.')
+parser.add_argument('--input2', '-i2', type=str,
+                    help='Usage: Set input2 path to second image, omit if using camera or video.')
+parser.add_argument('--video', '-vid', type=str,
+                    help='Usage: Set video path to desired input video, omit if using camera or two image inputs.')
+parser.add_argument('--model', '-m', type=str, default='optical_flow_estimation_raft_2023aug.onnx',
+                    help='Usage: Set model path, defaults to optical_flow_estimation_raft_2023aug.onnx.')
+parser.add_argument('--save', '-s', action='store_true',
+                    help='Usage: Specify to save a file with results. Invalid in case of camera input.')
+parser.add_argument('--visual', '-vis', action='store_true',
+                    help='Usage: Specify to open a new window to show results. Invalid in case of camera input.')
+args = parser.parse_args()
+UNKNOWN_FLOW_THRESH = 1e7
+def make_color_wheel():
+    """ Generate color wheel according Middlebury color code.
+    Returns:
+        Color wheel(numpy.ndarray): Color wheel
+    """
+    RY = 15
+    YG = 6
+    GC = 4
+    CB = 11
+    BM = 13
+    MR = 6
+    ncols = RY + YG + GC + CB + BM + MR
+    colorwheel = np.zeros([ncols, 3])
+    col = 0
+    # RY
+    colorwheel[0:RY, 0] = 255
+    colorwheel[0:RY, 1] = np.transpose(np.floor(255*np.arange(0, RY) / RY))
+    col += RY
+    # YG
+    colorwheel[col:col+YG, 0] = 255 - np.transpose(np.floor(255*np.arange(0, YG) / YG))
+    colorwheel[col:col+YG, 1] = 255
+    col += YG
+    # GC
+    colorwheel[col:col+GC, 1] = 255
+    colorwheel[col:col+GC, 2] = np.transpose(np.floor(255*np.arange(0, GC) / GC))
+    col += GC
+    # CB
+    colorwheel[col:col+CB, 1] = 255 - np.transpose(np.floor(255*np.arange(0, CB) / CB))
+    colorwheel[col:col+CB, 2] = 255
+    col += CB
+    # BM
+    colorwheel[col:col+BM, 2] = 255
+    colorwheel[col:col+BM, 0] = np.transpose(np.floor(255*np.arange(0, BM) / BM))
+    col += + BM
+    # MR
+    colorwheel[col:col+MR, 2] = 255 - np.transpose(np.floor(255 * np.arange(0, MR) / MR))
+    colorwheel[col:col+MR, 0] = 255
+    return colorwheel
+colorwheel = make_color_wheel()
+def compute_color(u, v):
+    """ Compute optical flow color map
+    Args:
+        u(numpy.ndarray): Optical flow horizontal map
+        v(numpy.ndarray): Optical flow vertical map
+    Returns:
+        img (numpy.ndarray): Optical flow in color code
+    """
+    [h, w] = u.shape
+    img = np.zeros([h, w, 3])
+    nanIdx = np.isnan(u) | np.isnan(v)
+    u[nanIdx] = 0
+    v[nanIdx] = 0
+    ncols = np.size(colorwheel, 0)
+    rad = np.sqrt(u**2+v**2)
+    a = np.arctan2(-v, -u) / np.pi
+    fk = (a+1) / 2 * (ncols - 1) + 1
+    k0 = np.floor(fk).astype(int)
+    k1 = k0 + 1
+    k1[k1 == ncols+1] = 1
+    f = fk - k0
+    for i in range(0, np.size(colorwheel,1)):
+        tmp = colorwheel[:, i]
+        col0 = tmp[k0-1] / 255
+        col1 = tmp[k1-1] / 255
+        col = (1-f) * col0 + f * col1
+        idx = rad <= 1
+        col[idx] = 1-rad[idx]*(1-col[idx])
+        notidx = np.logical_not(idx)
+        col[notidx] *= 0.75
+        img[:, :, i] = np.uint8(np.floor(255 * col*(1-nanIdx)))
+    return img
+def flow_to_image(flow):
+    """Convert flow into middlebury color code image
+    Args:
+        flow (np.ndarray): The computed flow map
+    Returns:
+        (np.ndarray): Image corresponding to the flow map.
+    """
+    u = flow[:, :, 0]
+    v = flow[:, :, 1]
+    maxu = -999.
+    maxv = -999.
+    minu = 999.
+    minv = 999.
+    idxUnknow = (abs(u) > UNKNOWN_FLOW_THRESH) | (abs(v) > UNKNOWN_FLOW_THRESH)
+    u[idxUnknow] = 0
+    v[idxUnknow] = 0
+    maxu = max(maxu, np.max(u))
+    minu = min(minu, np.min(u))
+    maxv = max(maxv, np.max(v))
+    minv = min(minv, np.min(v))
+    rad = np.sqrt(u ** 2 + v ** 2)
+    maxrad = max(-1, np.max(rad))
+    u = u/(maxrad + np.finfo(float).eps)
+    v = v/(maxrad + np.finfo(float).eps)
+    img = compute_color(u, v)
+    idx = np.repeat(idxUnknow[:, :, np.newaxis], 3, axis=2)
+    img[idx] = 0
+    return np.uint8(img)
+def draw_flow(flow_map, img_width, img_height):
+    """Convert flow map to image
+    Args:
+        flow_map (np.ndarray): The computed flow map
+        img_width (int): The width of the first input photo
+        img_height (int): The height of the first input photo
+    Returns:
+        (np.ndarray): Image corresponding to the flow map.
+    """
+	# Convert flow to image
+    flow_img = flow_to_image(flow_map)
+	# Convert to BGR
+    flow_img = cv.cvtColor(flow_img, cv.COLOR_RGB2BGR)
+	# Resize the depth map to match the input image shape
+    return cv.resize(flow_img, (img_width, img_height))
+def visualize(image1, image2, flow_img):
+    """
+    Combine two input images with resulting flow img and display them together
+    Args:
+        image1 (np.ndarray): The first input image.
+        imag2 (np.ndarray): The second input image.
+        flow_img (np.ndarray): The output flow map drawn as an image
+    Returns:
+        combined_img (np.ndarray): The visualized result.
+    """
+    combined_img = np.hstack((image1, image2, flow_img))
+    cv.namedWindow("Estimated flow", cv.WINDOW_NORMAL)
+    cv.imshow("Estimated flow", combined_img)
+    cv.waitKey(0)
+    return combined_img
+if __name__ == '__main__':
+    # Instantiate RAFT
+    model = Raft(modelPath=args.model)
+    if args.input1 is not None and args.input2 is not None:
+        # Read image
+        image1 = cv.imread(args.input1)
+        image2 = cv.imread(args.input2)
+        img_height, img_width, img_channels = image1.shape
+        # Inference
+        result = model.infer(image1, image2)
+        # Create flow image based on the result flow map
+        flow_image = draw_flow(result, img_width, img_height)
+        # Save results if save is true
+        if args.save:
+            print('Results saved to result.jpg\n')
+            cv.imwrite('result.jpg', flow_image)
+        # Visualize results in a new window
+        if args.visual:
+            input_output_visualization = visualize(image1, image2, flow_image)
+    elif args.video is not None:
+        cap = cv.VideoCapture(args.video)
+        FLOW_FRAME_OFFSET = 3 # Number of frame difference to estimate the optical flow
+        if args.visual:
+            cv.namedWindow("Estimated flow", cv.WINDOW_NORMAL)
+        frame_list = []
+        img_array = []
+        frame_num = 0
+        while cap.isOpened():
+            try:
+                # Read frame from the video
+                ret, prev_frame = cap.read()
+                frame_list.append(prev_frame)
+                if not ret:
+                    break
+            except:
+                continue
+            frame_num += 1
+            if frame_num <= FLOW_FRAME_OFFSET:
+                continue
+            else:
+                frame_num = 0
+            result = model.infer(frame_list[0], frame_list[-1])
+            img_height, img_width, img_channels = frame_list[0].shape
+            flow_img = draw_flow(result, img_width, img_height)
+            alpha = 0.6
+            combined_img = cv.addWeighted(frame_list[0], alpha, flow_img, (1-alpha),0)
+            if args.visual:
+                cv.imshow("Estimated flow", combined_img)
+            img_array.append(combined_img)
+            # Remove the oldest frame
+            frame_list.pop(0)
+            # Press key q to stop
+            if cv.waitKey(1) == ord('q'):
+                break
+        cap.release()
+        if args.save:
+            fourcc = cv.VideoWriter_fourcc(*'mp4v')
+            height,width,layers= img_array[0].shape
+            video = cv.VideoWriter('result.mp4', fourcc, 30.0, (width, height), isColor=True)
+            for img in img_array:
+                video.write(img)
+            video.release()
+        cv.destroyAllWindows()
+    else: # Omit input to call default camera
+        deviceId = 0
+        cap = cv.VideoCapture(deviceId)
+        w = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
+        h = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
+        tm = cv.TickMeter()
+        while cv.waitKey(30) < 0:
+            hasFrame1, frame1 = cap.read()
+            hasFrame2, frame2 = cap.read()
+            if not hasFrame1:
+                print('First frame was not grabbed!')
+                break
+            if not hasFrame2:
+                print('Second frame was not grabbed!')
+                break
+            # Inference
+            tm.start()
+            result = model.infer(frame1, frame2)
+            tm.stop()
+            result = draw_flow(result, w, h)
+            # Draw results on the input image
+            frame = visualize(frame1, frame2, result)
+            tm.reset()

raft.py ADDED Viewed

	@@ -0,0 +1,53 @@

+# This file is part of OpenCV Zoo project.
+import cv2 as cv
+import numpy as np
+class Raft:
+    def __init__(self, modelPath):
+        self._modelPath = modelPath
+        self.model = cv.dnn.readNet(self._modelPath)
+        self.input_names = ['0', '1']
+        self.first_input_name = self.input_names[0]
+        self.second_input_name = self.input_names[1]
+        self.input_shape = [360, 480] # change if going to use different model with different input shape
+        self.input_height = self.input_shape[0]
+        self.input_width = self.input_shape[1]
+    @property
+    def name(self):
+        return self.__class__.__name__
+    def _preprocess(self, image):
+        image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
+        img_input = cv.resize(image, (self.input_width,self.input_height))
+        img_input = img_input.transpose(2, 0, 1)
+        img_input = img_input[np.newaxis,:,:,:]
+        img_input = img_input.astype(np.float32)
+        return img_input
+    def infer(self, image1, image2):
+        # Preprocess
+        input_1 = self._preprocess(image1)
+        input_2 = self._preprocess(image2)
+        # Forward
+        self.model.setInput(input_1, self.first_input_name)
+        self.model.setInput(input_2, self.second_input_name)
+        layer_names = self.model.getLayerNames()
+        outputlayers = [layer_names[i-1] for i in self.model.getUnconnectedOutLayers()]
+        output = self.model.forward(outputlayers)
+        # Postprocess
+        results = self._postprocess(output)
+        return results
+    def _postprocess(self, output):
+        flow_map = output[1][0].transpose(1, 2, 0)
+        return flow_map