Update README.md
Browse files
README.md
CHANGED
@@ -3,197 +3,151 @@ license: apache-2.0
|
|
3 |
language:
|
4 |
- en
|
5 |
---
|
6 |
-
# Model Card for
|
7 |
|
8 |
-
|
9 |
|
10 |
-
This
|
11 |
|
12 |
-
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
|
|
18 |
|
|
|
19 |
|
20 |
-
- **
|
21 |
-
- **
|
22 |
-
- **
|
23 |
-
- **
|
24 |
-
- **
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
|
32 |
-
-
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
-
|
|
|
|
|
|
|
|
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
### Direct Use
|
41 |
-
|
42 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
-
|
46 |
-
### Downstream Use [optional]
|
47 |
-
|
48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
|
52 |
### Out-of-Scope Use
|
53 |
|
54 |
-
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
## Bias, Risks, and Limitations
|
59 |
|
60 |
-
|
61 |
|
62 |
-
|
|
|
|
|
|
|
63 |
|
64 |
-
|
|
|
|
|
|
|
|
|
65 |
|
66 |
-
|
|
|
|
|
|
|
|
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
## Training Details
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
-
|
81 |
|
82 |
-
|
83 |
|
84 |
### Training Procedure
|
85 |
|
86 |
-
|
87 |
|
88 |
#### Preprocessing [optional]
|
89 |
|
90 |
-
|
91 |
-
|
92 |
|
93 |
#### Training Hyperparameters
|
94 |
|
95 |
-
- **Training regime:**
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
-
|
101 |
-
[More Information Needed]
|
102 |
|
103 |
## Evaluation
|
104 |
|
105 |
-
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
|
127 |
### Results
|
128 |
|
129 |
-
|
130 |
|
131 |
#### Summary
|
132 |
|
133 |
-
|
134 |
|
135 |
## Model Examination [optional]
|
136 |
|
137 |
-
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
|
141 |
## Environmental Impact
|
142 |
|
143 |
-
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
|
153 |
-
## Technical Specifications [optional]
|
154 |
|
155 |
### Model Architecture and Objective
|
156 |
|
157 |
-
|
|
|
158 |
|
159 |
-
|
160 |
|
161 |
-
[More Information Needed]
|
162 |
|
163 |
#### Hardware
|
164 |
|
165 |
-
|
166 |
|
167 |
#### Software
|
168 |
|
169 |
-
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
-
|
197 |
-
## Model Card Contact
|
198 |
-
|
199 |
-
[More Information Needed]
|
|
|
3 |
language:
|
4 |
- en
|
5 |
---
|
6 |
+
# Model Card for V1 by EtherAI
|
7 |
|
8 |
+
EtherAI from India introduces "v1," an AI Text Generation Model implemented using PyTorch. The architecture incorporates components designed to achieve State-of-the-Art (SOTA) performance,has transformers core components and is optimized for parameter efficiency, aiming to surpass standard transformer capabilities. This model was trained on a static, offline dataset. Future versions or fine-tuned variants are anticipated as model behavior is improved based on community feedback and further development.
|
9 |
|
10 |
+
This model card serves as a guide for developers and users looking to leverage or build upon V1.
|
11 |
|
12 |
+
v1 is a language model designed for English text generation. The architecture emphasizes **parameter efficiency** and includes components intended to be key for **SOTA performance**, all within the PyTorch framework. While suitable for general-purpose text generation, its primary potential lies in fine-tuning for specific applications.
|
13 |
|
14 |
+
**Note for Developers:** This release constitutes a static model (Version 1.0). Its initial training was conducted with limited computational resources, utilizing a small dataset (approximately 3,500 to less than 16,000 samples). Consequently, the base model may exhibit signs of **overfitting**. The underlying architecture possesses robust characteristics, and its performance potential is best realized through fine-tuning with larger, high-quality datasets.
|
15 |
|
16 |
+
- **Developed by:** EtherAI Team
|
17 |
+
- **Model Version:** 1.0 (Beta Base Model of v1)
|
18 |
+
- **Model Release Date (Published Date):** 2025-05-12
|
19 |
+
- **Model Type:** Generative AI for Text Generation. This is a non-recurrent, Transformer-based architecture.
|
20 |
+
- **Language(s) (NLP):** English (en)
|
21 |
+
- **License:** Apache 2.0. The full license text can be found in the `LICENSE.md` file in this repository. Use, fine-tuning, or building upon this model requires adherence to the terms of the Apache 2.0 license.
|
22 |
+
- **Primary Purpose:**
|
23 |
+
- General-purpose English text generation.
|
24 |
+
- Serve as a foundational model for fine-tuning on specialized tasks and datasets.
|
25 |
+
- **Key Features & Architectural Highlights:**
|
26 |
+
- **Parameter Efficiency:** A core design principle.
|
27 |
+
- **Advanced Architecture:** Incorporates components aiming to surpass standard transformer capabilities.
|
28 |
+
- **Static Model (V1):** Trained on a static, offline dataset. Does not learn or update in real-time post-release of this version. Not a recurrent architecture.
|
29 |
+
- **Libraries Used:** PyTorch, Hugging Face Transformers.
|
30 |
|
31 |
+
### Direct Use Scenarios
|
32 |
|
33 |
+
V1 can be used directly for various text generation tasks, subject to its current limitations:
|
34 |
|
35 |
+
- **Content Creation:** Generating initial drafts of articles, blog posts, stories, poems, or scripts.
|
36 |
+
- **Brainstorming:** Assisting with idea generation.
|
37 |
+
- **Summarization:** Basic text condensation (fine-tuning is highly recommended for improved quality).
|
38 |
+
- **Question Answering:** Answering questions based on its limited training data (factual accuracy should always be verified).
|
39 |
+
- **Educational Exploration:** Assisting users in understanding language patterns.
|
|
|
|
|
40 |
|
41 |
+
**Considerations for Direct Use:** When using the model directly, users should be mindful of the potential for overfitting or generating content reflecting the biases inherent in its small initial training set.
|
42 |
|
43 |
+
### Downstream Use and Fine-tuning
|
44 |
|
45 |
+
V1 provides a suitable foundation for fine-tuning. Potential applications include:
|
|
|
|
|
46 |
|
47 |
+
- Developing specialized chatbots.
|
48 |
+
- Creating tools for automated report generation.
|
49 |
+
- Building personalized content recommenders.
|
50 |
+
- Fine-tuning for improved factual accuracy in specific domains.
|
51 |
+
- Adapting for unique stylistic requirements.
|
52 |
|
53 |
+
**Note on Fine-tuning:** The performance of downstream applications is significantly dependent on the quality and size of the fine-tuning dataset. Fine-tuning is the recommended method to leverage the architecture's full potential.
|
54 |
+
*Fine-tuning and use of this model must comply with the Apache 2.0 license.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
### Out-of-Scope Use
|
57 |
|
58 |
+
Use of this model is strongly advised against for the following purposes:
|
59 |
|
60 |
+
- Generating harmful, defamatory, harassing, hateful, or discriminatory content.
|
61 |
+
- Creating or spreading misinformation and disinformation.
|
62 |
+
- Impersonating individuals without consent.
|
63 |
+
- Making or influencing high-stakes decisions (medical, legal, financial, etc.).
|
64 |
+
- Any unlawful activities or spam generation.
|
65 |
+
- Surveillance and profiling without consent.
|
66 |
|
67 |
+
## Bias, Risks, and Limitations: Statement
|
68 |
|
69 |
+
All language models possess inherent biases, risks, and limitations. For V1, the following should be considered:
|
70 |
|
71 |
+
**Bias:**
|
72 |
+
- **Data Bias:** The initial training data was small and may contain societal biases, which the model will likely reflect.
|
73 |
+
- **Algorithmic Bias:** The architecture, while incorporating advanced components, is not immune to introducing or amplifying biases.
|
74 |
+
- **Cultural Bias:** The data is primarily English, likely reflecting a Western-centric perspective.
|
75 |
|
76 |
+
**Risks:**
|
77 |
+
- **Harmful/Offensive Content:** The model might generate inappropriate text.
|
78 |
+
- **Factual Inaccuracies (Hallucinations):** The model can produce plausible-sounding but incorrect information.
|
79 |
+
- **Over-Reliance:** Model outputs should not be accepted without critical assessment.
|
80 |
+
- **Misuse:** Potential for misuse as outlined in "Out-of-Scope Use."
|
81 |
|
82 |
+
**Limitations (Especially Relevant for Developers):**
|
83 |
+
- **Knowledge Cut-off:** The model's knowledge is static based on its training data.
|
84 |
+
- **No Real-Time Access:** It cannot access current information post-training.
|
85 |
+
- **Nuance/Sarcasm:** Difficulty interpreting or generating nuanced language or sarcasm may occur.
|
86 |
+
- **Overfitting Potential:** **This is a key consideration due to the initial small training dataset (approx. 3.5k to <16k samples).** The architecture, designed for efficiency and performance, may appear to overfit when trained on such limited data. **The model's definitive performance characteristics will emerge upon fine-tuning with a sufficiently large and high-quality dataset.**
|
87 |
|
88 |
+
### Mitigation Recommendations
|
89 |
|
90 |
+
- **Critical Evaluation:** Critical evaluation of model outputs is necessary.
|
91 |
+
- **Human Oversight:** Human oversight is essential for applications with significant impact.
|
92 |
+
- **Transparency:** End-users should be informed if text is AI-generated.
|
93 |
+
- **Fine-tuning for Robustness:** **This is crucial.** Utilizing high-quality, diverse datasets for fine-tuning is recommended to mitigate overfitting and enhance performance.
|
94 |
+
- **Content Filtering:** Content filtering mechanisms should be implemented in user-facing applications.
|
95 |
+
- **Feedback Loop:** Feedback on model performance and behavior is encouraged to inform future development.
|
|
|
96 |
|
97 |
### Training Data
|
98 |
|
99 |
+
V1 was initially pre-trained on a limited dataset of approximately 3,500 to less than 16,000 English text samples. The initial training utilized limited computational resources, preventing the use of a larger dataset for the base model release. Consequently, the base model should be viewed as a starting point. Performance is expected to significantly benefit from fine-tuning efforts using more comprehensive data.
|
100 |
|
101 |
+
Specific dataset sources are not disclosed.
|
102 |
|
103 |
### Training Procedure
|
104 |
|
105 |
+
Standard self-supervised learning techniques were employed for training.
|
106 |
|
107 |
#### Preprocessing [optional]
|
108 |
|
109 |
+
Standard text preprocessing and tokenization methods were applied.
|
|
|
110 |
|
111 |
#### Training Hyperparameters
|
112 |
|
113 |
+
- **Training regime:** Mixed Precision
|
114 |
+
- **Optimizer:** AdamW
|
115 |
+
- Details on learning rate, batch size, epochs/steps are not provided for this base model release.
|
|
|
|
|
|
|
|
|
116 |
|
117 |
## Evaluation
|
118 |
|
119 |
+
Initial evaluations were conducted on internal validation splits derived from the training data. Due to the limited size of the initial data and the potential for overfitting, these results should be considered preliminary. **The definitive performance characteristics of this architecture are best assessed after fine-tuning with a suitable dataset.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
120 |
|
121 |
### Results
|
122 |
|
123 |
+
The architecture demonstrates learning capability on the initial dataset. However, robust benchmarking against standard datasets is required post fine-tuning to establish comparative performance.
|
124 |
|
125 |
#### Summary
|
126 |
|
127 |
+
The model architecture shows promise. Its current state represents a baseline; significant performance improvements are anticipated with appropriate fine-tuning procedures.
|
128 |
|
129 |
## Model Examination [optional]
|
130 |
|
131 |
+
Detailed examination and analysis are pending further development and results from community fine-tuning efforts.
|
|
|
|
|
132 |
|
133 |
## Environmental Impact
|
134 |
|
135 |
+
- **Hardware Type:** Enterprise-grade GPUs/TPUs were used for the initial training.
|
136 |
+
- Information on compute hours, cloud provider, region, and carbon emitted for the initial limited training run is not detailed. Users performing fine-tuning are encouraged to consider and report the environmental impact of their processes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
|
|
|
138 |
|
139 |
### Model Architecture and Objective
|
140 |
|
141 |
+
V1 is implemented using PyTorch. The architecture is Transformer-based but includes **custom components designed to achieve SOTA (State-of-the-Art) performance**. A key design principle was **parameter efficiency**, and the architecture aims to offer advantages over standard transformers in this regard.
|
142 |
+
The training objective was next token prediction (standard language modeling).
|
143 |
|
144 |
+
Further architectural details (e.g., number of layers, hidden size) are standard for its model class but are not disclosed for this base model release.
|
145 |
|
|
|
146 |
|
147 |
#### Hardware
|
148 |
|
149 |
+
Training was performed on a GPU cluster.
|
150 |
|
151 |
#### Software
|
152 |
|
153 |
+
- PyTorch, Hugging Face Transformers, and other standard machine learning libraries were utilized.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|