Commit
·
2d8805f
1
Parent(s):
9249df4
Update README.md
Browse files
README.md
CHANGED
@@ -243,8 +243,7 @@ Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigco
|
|
243 |
|
244 |
1. [Model Summary](##model-summary)
|
245 |
2. [Use](##use)
|
246 |
-
3. [
|
247 |
-
4. [Training](##training)
|
248 |
5. [License](##license)
|
249 |
6. [Citation](##citation)
|
250 |
|
@@ -309,16 +308,16 @@ outputs = model.generate(inputs)
|
|
309 |
print(tokenizer.decode(outputs[0]))
|
310 |
```
|
311 |
|
312 |
-
|
313 |
|
314 |
-
|
315 |
|
316 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
317 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
318 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
319 |
- **Precision:** bfloat16
|
320 |
|
321 |
-
|
322 |
|
323 |
- **Pretraining:**
|
324 |
- **GPUs:** 512 Tesla A100
|
@@ -327,17 +326,17 @@ print(tokenizer.decode(outputs[0]))
|
|
327 |
- **GPUs:** 8 Tesla A100
|
328 |
- **Training time:** 4 hours
|
329 |
|
330 |
-
|
331 |
|
332 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
333 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
334 |
|
335 |
-
##
|
336 |
|
337 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
338 |
|
339 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
340 |
|
341 |
-
|
342 |
|
343 |
TODO
|
|
|
243 |
|
244 |
1. [Model Summary](##model-summary)
|
245 |
2. [Use](##use)
|
246 |
+
3. [Training](##training)
|
|
|
247 |
5. [License](##license)
|
248 |
6. [Citation](##citation)
|
249 |
|
|
|
308 |
print(tokenizer.decode(outputs[0]))
|
309 |
```
|
310 |
|
311 |
+
## Training
|
312 |
|
313 |
+
### Model
|
314 |
|
315 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
316 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
317 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
318 |
- **Precision:** bfloat16
|
319 |
|
320 |
+
### Hardware
|
321 |
|
322 |
- **Pretraining:**
|
323 |
- **GPUs:** 512 Tesla A100
|
|
|
326 |
- **GPUs:** 8 Tesla A100
|
327 |
- **Training time:** 4 hours
|
328 |
|
329 |
+
### Software
|
330 |
|
331 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
332 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
333 |
|
334 |
+
## License
|
335 |
|
336 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
337 |
|
338 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
339 |
|
340 |
+
## Citation
|
341 |
|
342 |
TODO
|