ptrdvn commited on
Commit
616b06f
·
verified ·
1 Parent(s): b4726bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md CHANGED
@@ -3,6 +3,14 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
 
 
 
 
 
 
 
 
6
  # How to use
7
 
8
  We have tested (and thus recommend) running this model on vLLM. We recommend running it from the vLLM openAI server, using the following command:
@@ -59,6 +67,7 @@ We will be uploading a 4bit AWQ model soon to make it easier to run this model o
59
 
60
  # Inference examples
61
 
 
62
 
63
  <details>
64
  <summary>Creative prompts</summary>
@@ -335,6 +344,124 @@ Ces joueurs sont souvent cités comme étant parmi les meilleurs du monde, mais
335
 
336
  </details>
337
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
338
 
339
  # Developers
340
 
 
3
  tags: []
4
  ---
5
 
6
+ # Model overview
7
+
8
+ This is a QLoRA finetune of the newly released [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) base model.
9
+
10
+ As the base model has not explicitly been trained to chat, we trained this model on a multilingual chat dataset so that the LLM community can use this model for conversations.
11
+
12
+ The accuracy of the model is surprisingly high, and has a decently fast inference speed (roughly 40 tokens/s single batch on our tests), so we believe this will be useful to the community.
13
+
14
  # How to use
15
 
16
  We have tested (and thus recommend) running this model on vLLM. We recommend running it from the vLLM openAI server, using the following command:
 
67
 
68
  # Inference examples
69
 
70
+ From qualitative testing, the model seems pretty smart, especially in English, and has very good recall of facts. It can still get confused with some logical questions, but has also passed a lot of the logical questions I have thrown at it that other open source LLMs often fail.
71
 
72
  <details>
73
  <summary>Creative prompts</summary>
 
344
 
345
  </details>
346
 
347
+ # Training dataset
348
+
349
+ We trained this model on conversations between human users and GPT-4.
350
+
351
+ This consists of two datasets:
352
+
353
+ * 6,206 conversations from the [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset) dataset ([link](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json?download=true))
354
+ * 3,011 conversations that we created. We wanted to increase the representation of non_english prompts in our training dataset, so we sampled initial prompts from [lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m), stratifying based on language. We then prompted gpt-4-0125 with these, and used the results as training data.
355
+
356
+ We plan to release more information on this second dataset soon, as we are using it another dataset.
357
+
358
+ The complete data used to train this model can be found at [lightblue/gpt4_conversations_multilingual](https://huggingface.co/datasets/lightblue/gpt4_conversations_multilingual)
359
+
360
+
361
+ # Training details
362
+
363
+ We trained this model using Axolotl for roughly 100 minutes in a A100 (80GB) x 4 environment on the Azure cloud (Standard_NC96ads_A100_v4).
364
+
365
+ We used Deepspeed Zero2 to effectively train over 4 GPUs.
366
+
367
+ We used the following config to train the model:
368
+
369
+ <details>
370
+ <summary>Training config</summary>
371
+
372
+ ```yaml
373
+
374
+ base_model: mistral-community/Mixtral-8x22B-v0.1
375
+ model_type: AutoModelForCausalLM
376
+ tokenizer_type: AutoTokenizer
377
+ trust_remote_code: true
378
+
379
+ load_in_8bit: false
380
+ load_in_4bit: true
381
+ strict: false
382
+
383
+ datasets:
384
+ - path: lightblue/gpt4_conversations_multilingual
385
+ type: sharegpt
386
+ conversation: mistral
387
+ dataset_prepared_path: ./prepared_dataset_2048-multiling
388
+ val_set_size: 0
389
+ output_dir: ./qlora-out-2048-multiling
390
+
391
+ ## You can optionally freeze the entire model and unfreeze a subset of parameters
392
+ unfrozen_parameters:
393
+ # - ^lm_head.weight$
394
+ # - ^model.embed_tokens.weight$[:32000]
395
+ # - model.layers.2[0-9]+.block_sparse_moe.gate
396
+ # - model.layers.2[0-9]+.block_sparse_moe.experts
397
+ # - model.layers.3[0-9]+.block_sparse_moe.gate
398
+ # - model.layers.3[0-9]+.block_sparse_moe.experts
399
+
400
+ model_config:
401
+ output_router_logits: true
402
+
403
+ adapter: qlora
404
+ lora_model_dir:
405
+
406
+ sequence_len: 2048
407
+ sample_packing: true
408
+ pad_to_sequence_len: true
409
+
410
+ lora_r: 16
411
+ lora_alpha: 16
412
+ lora_dropout: 0.05
413
+ lora_target_linear: true
414
+ lora_fan_in_fan_out:
415
+ #lora_target_modules:
416
+ # - gate
417
+ # - q_proj
418
+ # - k_proj
419
+ # - v_proj
420
+ # - o_proj
421
+ # - w1
422
+ # - w2
423
+ # - w3
424
+
425
+ gradient_accumulation_steps: 2
426
+ micro_batch_size: 1
427
+ num_epochs: 1
428
+ optimizer: adamw_bnb_8bit
429
+ lr_scheduler: cosine
430
+ learning_rate: 0.0002
431
+
432
+ use_wandb: true
433
+ wandb_project: wandb_project
434
+ wandb_entity: wandb_entity
435
+ wandb_name: wandb_name
436
+
437
+ train_on_inputs: false
438
+ group_by_length: false
439
+ bf16: auto
440
+ fp16:
441
+ tf32: false
442
+
443
+ gradient_checkpointing: true
444
+ early_stopping_patience:
445
+ resume_from_checkpoint:
446
+ local_rank:
447
+ logging_steps: 1
448
+ xformers_attention:
449
+ flash_attention: true
450
+
451
+ warmup_steps: 10
452
+ evals_per_epoch: 0
453
+ eval_table_size:
454
+ eval_max_new_tokens: 128
455
+ saves_per_epoch: 5
456
+ debug:
457
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
458
+ weight_decay: 0.0
459
+ fsdp:
460
+ fsdp_config:
461
+ special_tokens:
462
+ ```
463
+ </details>
464
+
465
 
466
  # Developers
467