GEN3C Graphical User Interface

GEN3C's GUI helps visualize and author novel camera trajectories to be generated by GEN3C. The GUI runs on your local machine, while the actual inference takes place either on the same or a remote machine. This repository contains all code needed: model loading & inference, inference server, and local client GUI.

Starting the GEN3C inference server

On the machine that will run inference, start by following the general installation instruction of GEN3C: INSTALL.md. Then, install a few additional dependencies for the inference server:

conda activate cosmos-predict1
cd GEN3C/gui
pip install -r ./requirements.txt

Finally, start the inference server while optionally setting some parameters via GEN3C_* environment variables:

# If model checkpoints were not downloaded to `GEN3C/checkpoints`, set the paths to the
# checkpoints directory:
# export GEN3C_CKPT_PATH="/path/to/checkpoints"
# Set if you would like to control the number of GPUs used by the inference server.
# By default, it will use as many as are available.
# export GEN3C_GPU_COUNT=1

CUDA_HOME=$CONDA_PREFIX fastapi dev --no-reload ./api/server.py --host 0.0.0.0

It may take a while to load the model weights. The server is ready when "Uvicorn running on ..." is printed.

SSH tunnel

If the inference server is running on a remote machine, you may need to open an SSH tunnel on your local machine:

# Usage:
#     ssh -NL <local_port>:<node_hostname>:<remote_port> <jump_hostname>

# Example 1: bind port 8000 of <remote_hostname> to your local port 8000
ssh -NL 8000:localhost:8000 <remote_hostname>

# Example 2: if <node_hostname> is only accessible through <jump_hostname>,
# bind port 8000 of <node_hostname> to your local port <8000>, going through <jump_hostname>.
ssh -NL 8000:<node_hostname>:8000 <jump_hostname>

Starting the GEN3C GUI on your local machine

Pre-requisites: the GUI was written with CUDA, and therefore requires an NVIDIA GPU as well as the CUDA Toolkit (version 11 or above).

On your local machine, clone this repository including submodules (--recursive) and enter the gui subdirectory:

git clone --recursive https://github.com/nv-tlabs/GEN3C
cd GEN3C/gui

Then, build the GUI:

cmake . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --config RelWithDebInfo -j

Then, install the GUI's Python dependencies. Note that the Cosmos dependencies do not need to be installed here, since inference is running in the separate environment that was setup above.

# In GEN3C/gui
pip install -r ./requirements.txt

Finally, start the GUI. It should automatically connect to the inference server that you started above.

python api/client.py

The server was not tested for concurrent usage by multiple users. Only connect one client to each server instance.

Using the GEN3C GUI

Seeding the model

The model can be easily seeded (initialized) with an image or one of our pre-processed dynamic video examples. Simply drag & drop the image or pre-processed folder onto the GUI window to trigger seeding. Alternatively, you can specify the path in the "Seeding" section of the right-hand window, then click "Seed".

The seeding data is uploaded to the server, which initializes the Gen3C 3D cache. When seeding from a single image, depth is automatically estimated using an off-the-shelf model. The estimated depth is downloaded back to the client in order to display the image as a 3D point cloud in the viewport.

Authoring a camera path

Once the model is seeded, the camera trajectory is initialized with the camera pose and intrinsics estimated from the seeding data. Using the left-hand window, you can then tweak or replace the camera trajectory in order to generate the scene from novel viewpoints.

We explain the main camera editing features from top to bottom:

"Record camera path": when enabled, the camera movement in the viewport will be saved as the camera trajectory in real time.
"Clear": clear the current camera path, starting from scratch.
"Init from views": re-initialize the camera path from the seeding data.
"Load" / "Save": load or save the camera trajectory in JSON format from / to the specified path.
Keyframe manipulation:
- "Add from cam": add the current viewport camera pose to the camera path.
- "Split": add a new keyframe at the current point along the camera path, as specified by the "Camera path time" slider below.
- |<: go to the first keyframe.
- < go to the previous keyframe.
- "Read": set the viewport camera to the current camera along the path.
- >|: go to the next keyframe.
- > go to the last keyframe.
- "Dup": duplicate the current keyframe.
- "Del": delete the current keyframe.
- "Set": set the current keyframe to the current viewport camera.
Keyframe editing:
- Individual keyframes can be tweaked directly in 3D from the viewport using the red / green / blue gizmo. Select which keyframe to edit using the "Camera path time" slider.
- "Translation" / "Rotation": tweak the position or orientation of the camera at the current keyframe.
- "Local" / "World": edit in local or global coordinate space.
- "Loop path": when selected, the path is made looping.
Camera path playback:
- "Start": seek to the start of the path.
- "Rev": start playing the path in reverse.
- "Play": start playing the path normally.
- "End": seed to the end of the path.
- "Playback speed": controls how fast the path will play. Note this is unrelated to the framerate or speed at which the video will eventually be generated.
- "Camera path time": seek to specific points along the camera path. The selected point determines which keyframe will be edited when using the buttons above.
Intrinsics editing:
- "Field of view": field of view in degrees of the current keyframe. Changing this value over time can be used to "zoom in" or out.
- "Apply to all keyframes": apply the current FoV value to all keyframes.
"Batch keyframe editing": use this section to edit multiple keyframes at once. This is useful when a trajectory has many keyframes, e.g. after seeding from a video or using "Record camera path".
"Advanced camera path settings": used to control the path interpolation smoothness.

A preview of the scene from the current camera viewpoint along the trajectory is shown in the middle of the left-hand window.

For convenience, you can also drag & drop a path that was previously authored and saved as JSON using the "Save" button above directly onto the viewport to load it.

Starting inference

Once a camera trajectory has been authored, Gen3C can generate a video based on the seeding data and the camera trajectory.

The video settings are located under "Video generation" in the left-hand window:

"Generate video": start inference with Gen3C on the server!
"Visualize rendered 3D cache": include a preview of the rendered Gen3C 3D cache at each frame in the output video.
"Add Gen3C keyframes to viewport after inference": once inference is complete, add the last frame of each batch of 121 frames to the viewport for preview.
"Video file": path to the output video file, where it will be saved once inference is complete. Supports Python's strftime() format codes.
"Duration": duration of the video to generate, in seconds.
"FPS": framerate of the video to render.
"Resolution": fixed based on the capability of the model.
"Export cameras": export the rendered camera trajectory, using a more portable JSON-based format.

Depending on the hardware used, inference can take a while. Once inference is complete, the resulting video will be automatically downloaded, written to disk, and opened with your default video player.

Video duration: when setting the duration and framerate of the video to generate, please keep in mind that the model will always generate multiples of 121 frames. The default duration and framerate are set to correspond to one batch of 121 frames. Non-multiple durations will be automatically trimmed down to the requested duration, but the frames will be generated regardless.

Note that when seeding from a dynamic video, the frame count of the generated video should match the frame count of the input video for best results.

Resetting the 3D cache: after generating a video, Gen3C's 3D cache may have been updated with new keyframes on the inference server. If you would like to discard the generated results from the cache, simply click the 'Seed' button again to reset the cache.