infosys
/

NT-Java-1.1B

Text Generation

NarrowTransformer

text-generation-inference

Model card Files Files and versions

rajabmondal commited on May 9, 2024

Commit

da27032

·

verified ·

1 Parent(s): efadb81

updated model summary

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -26,13 +26,13 @@ widget:
 ## Model Summary
-The JavaCoder models are 1B parameter models trained on 80+ programming languages from [The Stack (v1.2)](https://huggingface.co/datasets/bigcode/the-stack), with opt-out requests excluded. The model uses [Multi Query Attention](https://arxiv.org/abs/1911.02150), [a context window of 8192 tokens](https://arxiv.org/abs/2205.14135),  and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 1 trillion tokens.
-- **Repository:**
 - **Project Website:**
 - **Paper:**
 - **Point of Contact:**
-- **Languages:** 80+ Programming languages
 ## Use

 ## Model Summary
+The Narrow Transformer (NT) model NT-Java-1.1B is an open-source specialized code model built on StarCoderBase, designed for code completion tasks in Java programming. The model is a decoder-only transformer with Multi-Query-Attention and learned absolute positional embeddings and was finetuned for Java subset of the training data (starcoderdata) which is ~22B tokens and with a context of 8192 tokens.
+- **Repository:** [bigcode/Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
 - **Project Website:**
 - **Paper:**
 - **Point of Contact:**
+- **Languages:** Java
 ## Use