# Increasing Speed | |
* Integrate Flash Attention 2.0 cuda, significant speed up | |
* Utilize 8BIT Optimizer from BNB, big speed up weakness => bnb isn't compatible with all gpus | |
* Use a better tokenizer TokenMonster? | |
* Parallelize the transformer blocks similar to that of [PALMS](https://github.com/conceptofmind/PaLM) | |
* Look into MPTS config for LION for pretraining, did they use high batch size? |