Cosmos
nvidia
harrim-nv commited on
Commit
b57a0b5
·
verified ·
1 Parent(s): a3ea692

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -11
README.md CHANGED
@@ -100,7 +100,6 @@ extra_gated_button_content: Submit
100
  Cosmos-UpsamplePrompt1-12B-Transfer is a multimodal model designed to transform original input prompts into more detailed and enriched versions based on the control video. It improves the prompts by adding more details and maintaining a consistent description structure before they are used in a conditional world generation model, which generally leads to higher quality outputs. This model is ready for commercial use.
101
 
102
  ### License:
103
- GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Apache License Version 2.0.
104
 
105
  GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license).
106
  Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
@@ -131,15 +130,12 @@ Hugging Face 03/18/2025 via [https://huggingface.co/nvidia/Cosmos-UpsamplePrompt
131
  ## Input:
132
  **Input Type(s):** Text+Video <br>
133
  **Input Format:** Text: String, Video: mp4 <br>
134
- **Input Parameters:** One-dimensional (1D) <br>
135
- **Other Properties Related to Input:** Max of 512 tokens<br>
136
 
137
  ## Output:
138
  **Output Type(s):** Text <br>
139
  **Output Format:** String <br>
140
- **Output Parameters:** Text: One-dimensional (1D), Video: Three-dimensional (3D) <br>
141
- **Other Properties Related to Output:** Max of 512 tokens <br>
142
-
143
 
144
  ## Software Integration:
145
  **Runtime Engine(s):**
@@ -148,7 +144,6 @@ Hugging Face 03/18/2025 via [https://huggingface.co/nvidia/Cosmos-UpsamplePrompt
148
  **Supported Hardware Microarchitecture Compatibility:** <br>
149
  * NVIDIA Ampere <br>
150
  * NVIDIA Hopper <br>
151
-
152
  **Supported Operating System(s):** Linux <br>
153
 
154
  ## Model Version:
@@ -159,11 +154,13 @@ The initial release (v1.0) of Cosmos Prompt Upsampler contains the following mod
159
 
160
  See [Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) for on how to use the model.
161
 
162
-
163
  Example:
164
 
165
- * Input: `"A dancer gracefully executes ballet movements, showcasing extended arms and intricate leg positions."` + condition video.
166
- * Output: `"The video features a person in a dance studio with a wooden floor and a ballet barre along the wall. The individual is wearing a black sleeveless top, black pants, and black shoes with white soles. They are performing a series of ballet movements, including extending their arms out to the sides and then bringing them together in front of their body. The studio has a mirrored wall, and there is a person seated in the background, observing the dancer. The lighting in the studio is bright, and the walls are a neutral color."`
 
 
 
167
 
168
 
169
  ## Ethical Considerations
@@ -222,5 +219,5 @@ Field | Response
222
  :---------------------------------------------------|:----------------------------------
223
  Model Application(s): | Prompt enrichment for world generation
224
  Describe the life critical impact (if present). | None Known
225
- Use Case Restrictions: | [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
226
  Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.
 
100
  Cosmos-UpsamplePrompt1-12B-Transfer is a multimodal model designed to transform original input prompts into more detailed and enriched versions based on the control video. It improves the prompts by adding more details and maintaining a consistent description structure before they are used in a conditional world generation model, which generally leads to higher quality outputs. This model is ready for commercial use.
101
 
102
  ### License:
 
103
 
104
  GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license).
105
  Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
 
130
  ## Input:
131
  **Input Type(s):** Text+Video <br>
132
  **Input Format:** Text: String, Video: mp4 <br>
133
+ **Input Parameters:** Text: One-Dimensional (1D); Video: Three-Dimensional (3D) <br>
 
134
 
135
  ## Output:
136
  **Output Type(s):** Text <br>
137
  **Output Format:** String <br>
138
+ **Output Parameters:** Text: One-dimensional (1D) <br>
 
 
139
 
140
  ## Software Integration:
141
  **Runtime Engine(s):**
 
144
  **Supported Hardware Microarchitecture Compatibility:** <br>
145
  * NVIDIA Ampere <br>
146
  * NVIDIA Hopper <br>
 
147
  **Supported Operating System(s):** Linux <br>
148
 
149
  ## Model Version:
 
154
 
155
  See [Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) for on how to use the model.
156
 
 
157
  Example:
158
 
159
+ * Input: `"A robot in the kitchen picks up a bottle from the floor and puts it on a table."`
160
+ + condition video
161
+ ![condition video](seg_upsampler_example.png "My Project Logo")
162
+
163
+ * Output: `"The video features a kitchen with wooden cabinets and a granite countertop. A robot with a white body, black joints, and a red light on its head is seen performing tasks. It moves its arms and legs to pick up a white bottle with a red label from the floor and place it on the countertop. The robot then moves to a dining area with a wooden table and chairs, where it picks up a white chair and places it back in its original position."`
164
 
165
 
166
  ## Ethical Considerations
 
219
  :---------------------------------------------------|:----------------------------------
220
  Model Application(s): | Prompt enrichment for world generation
221
  Describe the life critical impact (if present). | None Known
222
+ Use Case Restrictions: | Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). Additional Information: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).
223
  Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.