Update README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,6 @@ extra_gated_button_content: Submit
|
|
100 |
Cosmos-UpsamplePrompt1-12B-Transfer is a multimodal model designed to transform original input prompts into more detailed and enriched versions based on the control video. It improves the prompts by adding more details and maintaining a consistent description structure before they are used in a conditional world generation model, which generally leads to higher quality outputs. This model is ready for commercial use.
|
101 |
|
102 |
### License:
|
103 |
-
GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Apache License Version 2.0.
|
104 |
|
105 |
GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license).
|
106 |
Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
|
@@ -131,15 +130,12 @@ Hugging Face 03/18/2025 via [https://huggingface.co/nvidia/Cosmos-UpsamplePrompt
|
|
131 |
## Input:
|
132 |
**Input Type(s):** Text+Video <br>
|
133 |
**Input Format:** Text: String, Video: mp4 <br>
|
134 |
-
**Input Parameters:** One-
|
135 |
-
**Other Properties Related to Input:** Max of 512 tokens<br>
|
136 |
|
137 |
## Output:
|
138 |
**Output Type(s):** Text <br>
|
139 |
**Output Format:** String <br>
|
140 |
-
**Output Parameters:** Text: One-dimensional (1D)
|
141 |
-
**Other Properties Related to Output:** Max of 512 tokens <br>
|
142 |
-
|
143 |
|
144 |
## Software Integration:
|
145 |
**Runtime Engine(s):**
|
@@ -148,7 +144,6 @@ Hugging Face 03/18/2025 via [https://huggingface.co/nvidia/Cosmos-UpsamplePrompt
|
|
148 |
**Supported Hardware Microarchitecture Compatibility:** <br>
|
149 |
* NVIDIA Ampere <br>
|
150 |
* NVIDIA Hopper <br>
|
151 |
-
|
152 |
**Supported Operating System(s):** Linux <br>
|
153 |
|
154 |
## Model Version:
|
@@ -159,11 +154,13 @@ The initial release (v1.0) of Cosmos Prompt Upsampler contains the following mod
|
|
159 |
|
160 |
See [Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) for on how to use the model.
|
161 |
|
162 |
-
|
163 |
Example:
|
164 |
|
165 |
-
* Input: `"A
|
166 |
-
|
|
|
|
|
|
|
167 |
|
168 |
|
169 |
## Ethical Considerations
|
@@ -222,5 +219,5 @@ Field | Response
|
|
222 |
:---------------------------------------------------|:----------------------------------
|
223 |
Model Application(s): | Prompt enrichment for world generation
|
224 |
Describe the life critical impact (if present). | None Known
|
225 |
-
Use Case Restrictions: | [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
|
226 |
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.
|
|
|
100 |
Cosmos-UpsamplePrompt1-12B-Transfer is a multimodal model designed to transform original input prompts into more detailed and enriched versions based on the control video. It improves the prompts by adding more details and maintaining a consistent description structure before they are used in a conditional world generation model, which generally leads to higher quality outputs. This model is ready for commercial use.
|
101 |
|
102 |
### License:
|
|
|
103 |
|
104 |
GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license).
|
105 |
Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
|
|
|
130 |
## Input:
|
131 |
**Input Type(s):** Text+Video <br>
|
132 |
**Input Format:** Text: String, Video: mp4 <br>
|
133 |
+
**Input Parameters:** Text: One-Dimensional (1D); Video: Three-Dimensional (3D) <br>
|
|
|
134 |
|
135 |
## Output:
|
136 |
**Output Type(s):** Text <br>
|
137 |
**Output Format:** String <br>
|
138 |
+
**Output Parameters:** Text: One-dimensional (1D) <br>
|
|
|
|
|
139 |
|
140 |
## Software Integration:
|
141 |
**Runtime Engine(s):**
|
|
|
144 |
**Supported Hardware Microarchitecture Compatibility:** <br>
|
145 |
* NVIDIA Ampere <br>
|
146 |
* NVIDIA Hopper <br>
|
|
|
147 |
**Supported Operating System(s):** Linux <br>
|
148 |
|
149 |
## Model Version:
|
|
|
154 |
|
155 |
See [Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) for on how to use the model.
|
156 |
|
|
|
157 |
Example:
|
158 |
|
159 |
+
* Input: `"A robot in the kitchen picks up a bottle from the floor and puts it on a table."`
|
160 |
+
+ condition video
|
161 |
+

|
162 |
+
|
163 |
+
* Output: `"The video features a kitchen with wooden cabinets and a granite countertop. A robot with a white body, black joints, and a red light on its head is seen performing tasks. It moves its arms and legs to pick up a white bottle with a red label from the floor and place it on the countertop. The robot then moves to a dining area with a wooden table and chairs, where it picks up a white chair and places it back in its original position."`
|
164 |
|
165 |
|
166 |
## Ethical Considerations
|
|
|
219 |
:---------------------------------------------------|:----------------------------------
|
220 |
Model Application(s): | Prompt enrichment for world generation
|
221 |
Describe the life critical impact (if present). | None Known
|
222 |
+
Use Case Restrictions: | Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
|
223 |
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.
|