Reafactoring of the tokenization pipeline, adjusted fasttext implementation 3011301 verified daniel-wojahn commited on May 21