nvidia
/

Cosmos-UpsamplePrompt1-12B-Transfer

Cosmos

nvidia

Model card Files Files and versions Community

harrim-nv commited on Mar 18

Commit

b57a0b5

verified ·

1 Parent(s): a3ea692

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -11

README.md CHANGED Viewed

@@ -100,7 +100,6 @@ extra_gated_button_content: Submit
 Cosmos-UpsamplePrompt1-12B-Transfer is a multimodal model designed to transform original input prompts into more detailed and enriched versions based on the control video. It improves the prompts by adding more details and maintaining a consistent description structure before they are used in a conditional world generation model, which generally leads to higher quality outputs. This model is ready for commercial use.
 ### License:
-GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Apache License Version 2.0.
 GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license).
 Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
@@ -131,15 +130,12 @@ Hugging Face 03/18/2025 via [https://huggingface.co/nvidia/Cosmos-UpsamplePrompt
 ## Input:
 **Input Type(s):** Text+Video <br>
 **Input Format:** Text: String, Video: mp4 <br>
-**Input Parameters:** One-dimensional (1D) <br>
-**Other Properties Related to Input:** Max of 512 tokens<br>
 ## Output:
 **Output Type(s):** Text <br>
 **Output Format:** String <br>
-**Output Parameters:** Text: One-dimensional (1D), Video: Three-dimensional (3D) <br>
-**Other Properties Related to Output:**  Max of 512 tokens <br>
 ## Software Integration:
 **Runtime Engine(s):**
@@ -148,7 +144,6 @@ Hugging Face 03/18/2025 via [https://huggingface.co/nvidia/Cosmos-UpsamplePrompt
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Ampere <br>
 * NVIDIA Hopper <br>
 **Supported Operating System(s):** Linux <br>
 ## Model Version:
@@ -159,11 +154,13 @@ The initial release (v1.0) of Cosmos Prompt Upsampler contains the following mod
 See [Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) for on how to use the model.
 Example:
-* Input: `"A dancer gracefully executes ballet movements, showcasing extended arms and intricate leg positions."` + condition video.
-* Output: `"The video features a person in a dance studio with a wooden floor and a ballet barre along the wall. The individual is wearing a black sleeveless top, black pants, and black shoes with white soles. They are performing a series of ballet movements, including extending their arms out to the sides and then bringing them together in front of their body. The studio has a mirrored wall, and there is a person seated in the background, observing the dancer. The lighting in the studio is bright, and the walls are a neutral color."`
 ## Ethical Considerations
@@ -222,5 +219,5 @@ Field                                               |  Response
 :---------------------------------------------------|:----------------------------------
 Model Application(s):                               |  Prompt enrichment for world generation
 Describe the life critical impact (if present).   |  None Known
-Use Case Restrictions:                              |  [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
 Model and dataset restrictions:            |  The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.  Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.

 Cosmos-UpsamplePrompt1-12B-Transfer is a multimodal model designed to transform original input prompts into more detailed and enriched versions based on the control video. It improves the prompts by adding more details and maintaining a consistent description structure before they are used in a conditional world generation model, which generally leads to higher quality outputs. This model is ready for commercial use.
 ### License:
 GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license).
 Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
 ## Input:
 **Input Type(s):** Text+Video <br>
 **Input Format:** Text: String, Video: mp4 <br>
+**Input Parameters:** Text: One-Dimensional (1D); Video: Three-Dimensional (3D) <br>
 ## Output:
 **Output Type(s):** Text <br>
 **Output Format:** String <br>
+**Output Parameters:** Text: One-dimensional (1D) <br>
 ## Software Integration:
 **Runtime Engine(s):**
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Ampere <br>
 * NVIDIA Hopper <br>
 **Supported Operating System(s):** Linux <br>
 ## Model Version:
 See [Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) for on how to use the model.
 Example:
+* Input: `"A robot in the kitchen picks up a bottle from the floor and puts it on a table."`
++ condition video
+![condition video](seg_upsampler_example.png "My Project Logo")
+* Output: `"The video features a kitchen with wooden cabinets and a granite countertop. A robot with a white body, black joints, and a red light on its head is seen performing tasks. It moves its arms and legs to pick up a white bottle with a red label from the floor and place it on the countertop. The robot then moves to a dining area with a wooden table and chairs, where it picks up a white chair and places it back in its original position."`
 ## Ethical Considerations
 :---------------------------------------------------|:----------------------------------
 Model Application(s):                               |  Prompt enrichment for world generation
 Describe the life critical impact (if present).   |  None Known
+Use Case Restrictions:                              |  Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
 Model and dataset restrictions:            |  The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.  Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.