Updating description
Browse files
app.py
CHANGED
@@ -242,10 +242,7 @@ def create_visualization(blt_ps, d_model_slider, local_n_layers_slider):
|
|
242 |
with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
|
243 |
gr.Markdown("""
|
244 |
# BLT vs BPE FLOPs Comparison
|
245 |
-
|
246 |
-
This interactive visualization compares the computational efficiency (FLOPs per byte) and total model parameters between:
|
247 |
-
- **BPE (Byte Pair Encoding)**: Traditional transformer architecture
|
248 |
-
- **BLT (Byte Latent Transformer)**: Novel architecture with Global and Local components with a dynamic patch size to segment bytes.
|
249 |
|
250 |
For inspiration, have a look at the paper's [BLT architecture configurations](https://arxiv.org/html/2412.09871v1#:~:text=%5Cbeginappendix-,11,Table%C2%A010%20shows%20different%20hyper%20parameter%20settings%20for%20BLT%20models.,-Encoder) for some inspiration.
|
251 |
|
@@ -253,6 +250,7 @@ with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
|
|
253 |
1. Patch size reduces global model FLOPs but not local model
|
254 |
2. Increasing patch size and global model dimension doesn't change total FLOPs
|
255 |
3. In smaller BLTs, local models constitute a larger portion of the total FLOPs
|
|
|
256 |
Parameter counts are displayed below each bar.
|
257 |
""")
|
258 |
|
|
|
242 |
with gr.Blocks(title="BLT vs BPE FLOPs Comparison") as demo:
|
243 |
gr.Markdown("""
|
244 |
# BLT vs BPE FLOPs Comparison
|
245 |
+
Companion blog post [can be found here](https://lucalp.dev/bitter-lesson-tokenization-and-blt).
|
|
|
|
|
|
|
246 |
|
247 |
For inspiration, have a look at the paper's [BLT architecture configurations](https://arxiv.org/html/2412.09871v1#:~:text=%5Cbeginappendix-,11,Table%C2%A010%20shows%20different%20hyper%20parameter%20settings%20for%20BLT%20models.,-Encoder) for some inspiration.
|
248 |
|
|
|
250 |
1. Patch size reduces global model FLOPs but not local model
|
251 |
2. Increasing patch size and global model dimension doesn't change total FLOPs
|
252 |
3. In smaller BLTs, local models constitute a larger portion of the total FLOPs
|
253 |
+
|
254 |
Parameter counts are displayed below each bar.
|
255 |
""")
|
256 |
|