Ontocord.AI
commited on
Commit
·
4a51bc1
1
Parent(s):
92d5e4d
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,6 +12,16 @@ This is a merge of the following MPT-7B models:
|
|
| 12 |
- **e**mozilla/mpt-7b-storysummarizer
|
| 13 |
- **n**omic-ai/gpt4all-mpt
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
# Test eval on only 10% of eval set
|
| 16 |
|
| 17 |
hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,trust_remote_code=True), limit: 0.1, provide_description: False, num_fewshot: 0, batch_size: None
|
|
@@ -142,19 +152,3 @@ hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,tru
|
|
| 142 |
| | |rougeL_diff|-8.5753|± |2.8259|
|
| 143 |
|
| 144 |
|
| 145 |
-
## Model License
|
| 146 |
-
|
| 147 |
-
Apache 2.0
|
| 148 |
-
|
| 149 |
-
# Original Model Card From MPT-7B-StoryWriter-65k+
|
| 150 |
-
|
| 151 |
-
MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths.
|
| 152 |
-
It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
|
| 153 |
-
At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
|
| 154 |
-
We demonstrate generations as long as 84k tokens on a single node of 8 A100-80GB GPUs in our [blogpost](https://www.mosaicml.com/blog/mpt-7b).
|
| 155 |
-
* License: Apache 2.0
|
| 156 |
-
|
| 157 |
-
This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
|
|
|
| 12 |
- **e**mozilla/mpt-7b-storysummarizer
|
| 13 |
- **n**omic-ai/gpt4all-mpt
|
| 14 |
|
| 15 |
+
|
| 16 |
+
## Model License
|
| 17 |
+
|
| 18 |
+
Apache 2.0
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
## Purpose
|
| 22 |
+
|
| 23 |
+
This model is for experting with merging and routing to expert layers.
|
| 24 |
+
|
| 25 |
# Test eval on only 10% of eval set
|
| 26 |
|
| 27 |
hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,trust_remote_code=True), limit: 0.1, provide_description: False, num_fewshot: 0, batch_size: None
|
|
|
|
| 152 |
| | |rougeL_diff|-8.5753|± |2.8259|
|
| 153 |
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|