lucalp commited on
Commit
d90ad1e
·
1 Parent(s): 810a93a

Adding link to paper's configurations

Browse files
Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -249,6 +249,8 @@ with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
249
  - **BPE (Byte Pair Encoding)**: Traditional transformer architecture
250
  - **BLT (Byte Latent Transformer)**: Novel architecture with Global and Local components with a dynamic patch size to segment bytes.
251
 
 
 
252
  A few things you'll notice:
253
  1. Patch size reduces global model FLOPs but not local model
254
  2. Increasing patch size and global model dimension doesn't change total FLOPs
 
249
  - **BPE (Byte Pair Encoding)**: Traditional transformer architecture
250
  - **BLT (Byte Latent Transformer)**: Novel architecture with Global and Local components with a dynamic patch size to segment bytes.
251
 
252
+ Have a look at the paper's [BLT architecture configurations](https://arxiv.org/html/2412.09871v1#:~:text=%5Cbeginappendix-,11,Table%C2%A010%20shows%20different%20hyper%20parameter%20settings%20for%20BLT%20models.,-Encoder) for some inspiration.
253
+
254
  A few things you'll notice:
255
  1. Patch size reduces global model FLOPs but not local model
256
  2. Increasing patch size and global model dimension doesn't change total FLOPs