Update README.md
Browse files
README.md
CHANGED
@@ -138,7 +138,7 @@ Detailed examination and analysis are pending further development and results fr
|
|
138 |
|
139 |
### Model Architecture and Objective
|
140 |
|
141 |
-
V1 is implemented using PyTorch. The architecture
|
142 |
The training objective was next token prediction (standard language modeling).
|
143 |
|
144 |
Further architectural details (e.g., number of layers, hidden size) are standard for its model class but are not disclosed for this base model release.
|
@@ -150,4 +150,4 @@ Training was performed on a GPU cluster.
|
|
150 |
|
151 |
#### Software
|
152 |
|
153 |
-
- PyTorch,
|
|
|
138 |
|
139 |
### Model Architecture and Objective
|
140 |
|
141 |
+
V1 is implemented using PyTorch. The architecture includes Transformer-based components also but includes **custom components designed to achieve SOTA (State-of-the-Art) performance**. A key design principle was **parameter efficiency**, and the architecture aims to offer advantages over standard transformers in this regard.
|
142 |
The training objective was next token prediction (standard language modeling).
|
143 |
|
144 |
Further architectural details (e.g., number of layers, hidden size) are standard for its model class but are not disclosed for this base model release.
|
|
|
150 |
|
151 |
#### Software
|
152 |
|
153 |
+
- PyTorch, Core Transformers Components, and other standard machine learning libraries were utilized.
|