Does it mean that the Qwen model is passing tokens back to the context. In the retrieval level these tokens got encoded into MALM model's hashes which are used to retrieve information from context in MALM tokens and then the result converts to Qwen's tokens again?

reacted to codelion's post with 🔥 17 days ago

Post

3088

Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop

1 reply

upvoted an article 17 days ago

Article

Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

19 days ago

•

liked a model 19 days ago

Lightricks/LTX-2

Image-to-Video • Updated 8 days ago • 2.58M • • 1.5k

liked a Space about 1 month ago

AI Deadlines

⚡

648

View upcoming AI conference deadlines in one place

upvoted a collection about 2 months ago

Transformers.js demos

Collection

A collection of my favorite WebML demos, built with Transformers.js! • 30 items • Updated Jul 11, 2024 • 138

liked a Space about 2 months ago

Remove Background Web

🖼

734

In-browser background removal

reacted to Xenova's post with 🔥 2 months ago

Post

17205

Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!

3 replies

commented on Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H 8 months ago

The result is pretty impressive! Here are some examples with queries (the results presented as small red circles):

Query #1

Where to click to upload a file?

Query #2

Where the result of the request would be presented as an image?

Worth noting, the points are located at the logical centers of the UI components (not at the titles or visual centers).

I'd also want to add information about license in the article, it took some time to figure out where it's at HF. For those who are curious it's Apache 2.0 (very permissive).

liked a dataset 8 months ago