Spaces:
Running
Running
added description
Browse files
app.py
CHANGED
@@ -107,18 +107,55 @@ def goto(page: str):
|
|
107 |
page = st.query_params.get("page", "demo")
|
108 |
|
109 |
if page == "info":
|
110 |
-
st.title("ℹ about this demo")
|
111 |
st.write("""
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
-
|
118 |
-
-
|
119 |
-
|
120 |
-
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
if st.button("⬅ back to demo"):
|
123 |
goto("demo")
|
124 |
st.stop()
|
|
|
107 |
page = st.query_params.get("page", "demo")
|
108 |
|
109 |
if page == "info":
|
|
|
110 |
st.write("""
|
111 |
+
# 🧠 Embedding Visualizer – About
|
112 |
+
|
113 |
+
This demo shows how **vector embeddings** can capture the meaning of words and place them in a **numerical space** where related items appear close together.
|
114 |
+
|
115 |
+
You can:
|
116 |
+
- Choose from predefined or mixed datasets (e.g., countries, animals, actors, sports)
|
117 |
+
- Select different embedding models to compare results
|
118 |
+
- Switch between 2D and 3D visualizations
|
119 |
+
- Edit the list of words directly and see the updated projection instantly
|
120 |
+
|
121 |
+
---
|
122 |
+
|
123 |
+
## 📌 What are Vector Embeddings?
|
124 |
+
A **vector embedding** is a way of representing text (words, sentences, or documents) as a list of numbers — a point in a high-dimensional space.
|
125 |
+
These numbers are produced by a trained **language model** that captures semantic meaning.
|
126 |
+
|
127 |
+
In this space:
|
128 |
+
- Words with **similar meanings** end up **near each other**
|
129 |
+
- Dissimilar words are placed **far apart**
|
130 |
+
- The model can detect relationships and groupings that aren’t obvious from spelling or grammar alone
|
131 |
+
|
132 |
+
Example:
|
133 |
+
`"cat"` and `"dog"` will likely be closer to each other than to `"table"`, because the model “knows” they are both animals.
|
134 |
+
|
135 |
+
---
|
136 |
+
|
137 |
+
## 🔍 How the Demo Works
|
138 |
+
1. **Embedding step** – Each word is converted into a high-dimensional vector (e.g., 384, 768, or 1024 dimensions depending on the model).
|
139 |
+
2. **Dimensionality reduction** – Since humans can’t visualize hundreds of dimensions, the vectors are projected to 2D or 3D using **PCA** (Principal Component Analysis).
|
140 |
+
3. **Visualization** – The projected points are plotted, with labels showing the original words.
|
141 |
+
You can rotate the 3D view to explore groupings.
|
142 |
+
|
143 |
+
---
|
144 |
+
|
145 |
+
## 💡 Typical Applications of Embeddings
|
146 |
+
- **Semantic search** – Find relevant results even if exact keywords don’t match
|
147 |
+
- **Clustering & topic discovery** – Group related items automatically
|
148 |
+
- **Recommendations** – Suggest similar products, movies, or articles
|
149 |
+
- **Deduplication** – Detect near-duplicate content
|
150 |
+
- **Analogies** – Explore relationships like *"king" – "man" + "woman" ≈ "queen"*
|
151 |
+
|
152 |
+
---
|
153 |
+
|
154 |
+
## 🚀 Try it Yourself
|
155 |
+
- Pick a dataset or create your own by editing the list
|
156 |
+
- Switch models to compare how the embedding space changes
|
157 |
+
- Toggle between 2D and 3D to explore patterns
|
158 |
+
""".strip())
|
159 |
if st.button("⬅ back to demo"):
|
160 |
goto("demo")
|
161 |
st.stop()
|