license: bsd-3-clause | |
tags: | |
- kernel | |
# Flash Attention 3 | |
Flash Attention is a fast and memory-efficient implementation of the | |
attention mechanism, designed to work with large models and long sequences. | |
This is a Hugging Face compliant kernel build of Flash Attention. | |
Original code here [https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention). | |