lucalp commited on
Commit
3140e72
·
1 Parent(s): 515c5ed

Updating description

Browse files
Files changed (1) hide show
  1. app.py +2 -4
app.py CHANGED
@@ -242,10 +242,7 @@ def create_visualization(blt_ps, d_model_slider, local_n_layers_slider):
242
  with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
243
  gr.Markdown("""
244
  # BLT vs BPE FLOPs Comparison
245
-
246
- This interactive visualization compares the computational efficiency (FLOPs per byte) and total model parameters between:
247
- - **BPE (Byte Pair Encoding)**: Traditional transformer architecture
248
- - **BLT (Byte Latent Transformer)**: Novel architecture with Global and Local components with a dynamic patch size to segment bytes.
249
 
250
  For inspiration, have a look at the paper's [BLT architecture configurations](https://arxiv.org/html/2412.09871v1#:~:text=%5Cbeginappendix-,11,Table%C2%A010%20shows%20different%20hyper%20parameter%20settings%20for%20BLT%20models.,-Encoder) for some inspiration.
251
 
@@ -253,6 +250,7 @@ with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
253
  1. Patch size reduces global model FLOPs but not local model
254
  2. Increasing patch size and global model dimension doesn't change total FLOPs
255
  3. In smaller BLTs, local models constitute a larger portion of the total FLOPs
 
256
  Parameter counts are displayed below each bar.
257
  """)
258
 
 
242
  with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
243
  gr.Markdown("""
244
  # BLT vs BPE FLOPs Comparison
245
+ Companion blog post [can be found here](https://lucalp.dev/bitter-lesson-tokenization-and-blt).
 
 
 
246
 
247
  For inspiration, have a look at the paper's [BLT architecture configurations](https://arxiv.org/html/2412.09871v1#:~:text=%5Cbeginappendix-,11,Table%C2%A010%20shows%20different%20hyper%20parameter%20settings%20for%20BLT%20models.,-Encoder) for some inspiration.
248
 
 
250
  1. Patch size reduces global model FLOPs but not local model
251
  2. Increasing patch size and global model dimension doesn't change total FLOPs
252
  3. In smaller BLTs, local models constitute a larger portion of the total FLOPs
253
+
254
  Parameter counts are displayed below each bar.
255
  """)
256