Muennighoff commited on
Commit
2d8805f
·
1 Parent(s): 9249df4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -243,8 +243,7 @@ Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigco
243
 
244
  1. [Model Summary](##model-summary)
245
  2. [Use](##use)
246
- 3. [Limitations](##limitations)
247
- 4. [Training](##training)
248
  5. [License](##license)
249
  6. [Citation](##citation)
250
 
@@ -309,16 +308,16 @@ outputs = model.generate(inputs)
309
  print(tokenizer.decode(outputs[0]))
310
  ```
311
 
312
- # Training
313
 
314
- ## Model
315
 
316
  - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
317
  - **Steps:** 250k pretraining & 30 instruction tuning
318
  - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
319
  - **Precision:** bfloat16
320
 
321
- ## Hardware
322
 
323
  - **Pretraining:**
324
  - **GPUs:** 512 Tesla A100
@@ -327,17 +326,17 @@ print(tokenizer.decode(outputs[0]))
327
  - **GPUs:** 8 Tesla A100
328
  - **Training time:** 4 hours
329
 
330
- ## Software
331
 
332
  - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
333
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
334
 
335
- ## 协议 | License
336
 
337
  本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
338
 
339
  The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
340
 
341
- # Citation
342
 
343
  TODO
 
243
 
244
  1. [Model Summary](##model-summary)
245
  2. [Use](##use)
246
+ 3. [Training](##training)
 
247
  5. [License](##license)
248
  6. [Citation](##citation)
249
 
 
308
  print(tokenizer.decode(outputs[0]))
309
  ```
310
 
311
+ ## Training
312
 
313
+ ### Model
314
 
315
  - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
316
  - **Steps:** 250k pretraining & 30 instruction tuning
317
  - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
318
  - **Precision:** bfloat16
319
 
320
+ ### Hardware
321
 
322
  - **Pretraining:**
323
  - **GPUs:** 512 Tesla A100
 
326
  - **GPUs:** 8 Tesla A100
327
  - **Training time:** 4 hours
328
 
329
+ ### Software
330
 
331
  - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
332
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
333
 
334
+ ## License
335
 
336
  本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
337
 
338
  The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
339
 
340
+ ## Citation
341
 
342
  TODO