Spaces:

berndf
/

EmbeddingVisualizer

Running

App Files Files Community

berndf commited on 3 days ago

Commit

b0f2134

verified ·

1 Parent(s): b27e307

added description

Browse files

Files changed (1) hide show

app.py +48 -11

app.py CHANGED Viewed

@@ -107,18 +107,55 @@ def goto(page: str):
 page = st.query_params.get("page", "demo")
 if page == "info":
-    st.title("ℹ about this demo")
     st.write("""
-**embeddings** turn words (or longer text) into numerical vectors.
-in this vector space, **semantically related** items end up **near** each other.
-use cases:
-- semantic search & retrieval
-- clustering & topic discovery
-- recommendations & deduplication
-- measuring similarity and analogies
-this demo embeds single words with a selectable model, reduces to 2d/3d with pca,
-and shows how related words appear near each other in the projected space.
-    """.strip())
     if st.button("⬅ back to demo"):
         goto("demo")
     st.stop()

 page = st.query_params.get("page", "demo")
 if page == "info":
     st.write("""
+# 🧠 Embedding Visualizer – About
+This demo shows how **vector embeddings** can capture the meaning of words and place them in a **numerical space** where related items appear close together.
+You can:
+- Choose from predefined or mixed datasets (e.g., countries, animals, actors, sports)
+- Select different embedding models to compare results
+- Switch between 2D and 3D visualizations
+- Edit the list of words directly and see the updated projection instantly
+---
+## 📌 What are Vector Embeddings?
+A **vector embedding** is a way of representing text (words, sentences, or documents) as a list of numbers — a point in a high-dimensional space.
+These numbers are produced by a trained **language model** that captures semantic meaning.
+In this space:
+- Words with **similar meanings** end up **near each other**
+- Dissimilar words are placed **far apart**
+- The model can detect relationships and groupings that aren’t obvious from spelling or grammar alone
+Example:
+`"cat"` and `"dog"` will likely be closer to each other than to `"table"`, because the model “knows” they are both animals.
+---
+## 🔍 How the Demo Works
+1. **Embedding step** – Each word is converted into a high-dimensional vector (e.g., 384, 768, or 1024 dimensions depending on the model).
+2. **Dimensionality reduction** – Since humans can’t visualize hundreds of dimensions, the vectors are projected to 2D or 3D using **PCA** (Principal Component Analysis).
+3. **Visualization** – The projected points are plotted, with labels showing the original words.
+   You can rotate the 3D view to explore groupings.
+---
+## 💡 Typical Applications of Embeddings
+- **Semantic search** – Find relevant results even if exact keywords don’t match
+- **Clustering & topic discovery** – Group related items automatically
+- **Recommendations** – Suggest similar products, movies, or articles
+- **Deduplication** – Detect near-duplicate content
+- **Analogies** – Explore relationships like *"king" – "man" + "woman" ≈ "queen"*
+---
+## 🚀 Try it Yourself
+- Pick a dataset or create your own by editing the list
+- Switch models to compare how the embedding space changes
+- Toggle between 2D and 3D to explore patterns
+""".strip())
     if st.button("⬅ back to demo"):
         goto("demo")
     st.stop()