Multi-Domain-Expert-Learning
/

given-mpt-7b

Text Generation

text-generation-inference

Model card Files Files and versions

Ontocord.AI commited on Jun 28, 2023

Commit

4a51bc1

·

1 Parent(s): 92d5e4d

Update README.md

Files changed (1) hide show

README.md +10 -16

README.md CHANGED Viewed

@@ -12,6 +12,16 @@ This is a merge of the following MPT-7B models:
 - **e**mozilla/mpt-7b-storysummarizer
 - **n**omic-ai/gpt4all-mpt
 # Test eval on only 10% of eval set
 hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,trust_remote_code=True), limit: 0.1, provide_description: False, num_fewshot: 0, batch_size: None
@@ -142,19 +152,3 @@ hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,tru
 |                                                 |       |rougeL_diff|-8.5753|±  |2.8259|
-## Model License
-Apache 2.0
-# Original Model Card From MPT-7B-StoryWriter-65k+
-MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths.
-It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
-At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
-We demonstrate generations as long as 84k tokens on a single node of 8 A100-80GB GPUs in our [blogpost](https://www.mosaicml.com/blog/mpt-7b).
-  * License: Apache 2.0
-This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.

 - **e**mozilla/mpt-7b-storysummarizer
 - **n**omic-ai/gpt4all-mpt
+## Model License
+Apache 2.0
+## Purpose
+This model is for experting with merging and routing to expert layers.
 # Test eval on only 10% of eval set
 hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,trust_remote_code=True), limit: 0.1, provide_description: False, num_fewshot: 0, batch_size: None
 |                                                 |       |rougeL_diff|-8.5753|±  |2.8259|