manu commited on
Commit
b11f4ae
·
1 Parent(s): 5d7f6da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -8,4 +8,5 @@ language:
8
 
9
  BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer.
10
 
11
- Only fitted on 100,000 samples (7.5M words).
 
 
8
 
9
  BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer.
10
 
11
+ Only fitted on 100,000 samples (7.5M words).
12
+ # Warning - Dataset was not shuffled so fitted on code only, not usable as is !