infosys
/

NT-Java-1.1B

BalajiInfosys commited on May 6, 2024

Commit

747372a

verified ·

1 Parent(s): 8740e1b

Update README.md (#1)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ widget:
 ## Model Summary
-The JavaCoder models are !B parameter models trained on 80+ programming languages from [The Stack (v1.2)](https://huggingface.co/datasets/bigcode/the-stack), with opt-out requests excluded. The model uses [Multi Query Attention](https://arxiv.org/abs/1911.02150), [a context window of 8192 tokens](https://arxiv.org/abs/2205.14135),  and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 1 trillion tokens.
 - **Repository:**
 - **Project Website:**
@@ -88,7 +88,7 @@ The model has been trained on source code from 80+ programming languages. The pr
 ## Hardware
 - **GPUs:** 6 NVIDIA A100 80GB
-- **Training time:**  days
 ## Software

 ## Model Summary
+The JavaCoder models are 1B parameter models trained on 80+ programming languages from [The Stack (v1.2)](https://huggingface.co/datasets/bigcode/the-stack), with opt-out requests excluded. The model uses [Multi Query Attention](https://arxiv.org/abs/1911.02150), [a context window of 8192 tokens](https://arxiv.org/abs/2205.14135),  and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 1 trillion tokens.
 - **Repository:**
 - **Project Website:**
 ## Hardware
 - **GPUs:** 6 NVIDIA A100 80GB
+- **Training time:**  4 days
 ## Software