Spaces:
Running
Running
fix
Browse files
app.py
CHANGED
@@ -107,56 +107,17 @@ def goto(page: str):
|
|
107 |
page = st.query_params.get("page", "demo")
|
108 |
|
109 |
if page == "info":
|
110 |
-
st.title("about this demo")
|
111 |
st.write("""
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
-
|
118 |
-
-
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
---
|
123 |
-
|
124 |
-
## 📌 What are Vector Embeddings?
|
125 |
-
A **vector embedding** is a way of representing text (words, sentences, or documents) as a list of numbers — a point in a high-dimensional space.
|
126 |
-
These numbers are produced by a trained **language model** that captures semantic meaning.
|
127 |
-
|
128 |
-
In this space:
|
129 |
-
- Words with **similar meanings** end up **near each other**
|
130 |
-
- Dissimilar words are placed **far apart**
|
131 |
-
- The model can detect relationships and groupings that aren’t obvious from spelling or grammar alone
|
132 |
-
|
133 |
-
Example:
|
134 |
-
`"cat"` and `"dog"` will likely be closer to each other than to `"table"`, because the model “knows” they are both animals.
|
135 |
-
|
136 |
-
---
|
137 |
-
|
138 |
-
## 🔍 How the Demo Works
|
139 |
-
1. **Embedding step** – Each word is converted into a high-dimensional vector (e.g., 384, 768, or 1024 dimensions depending on the model).
|
140 |
-
2. **Dimensionality reduction** – Since humans can’t visualize hundreds of dimensions, the vectors are projected to 2D or 3D using **PCA** (Principal Component Analysis).
|
141 |
-
3. **Visualization** – The projected points are plotted, with labels showing the original words.
|
142 |
-
You can rotate the 3D view to explore groupings.
|
143 |
-
|
144 |
-
---
|
145 |
-
|
146 |
-
## 💡 Typical Applications of Embeddings
|
147 |
-
- **Semantic search** – Find relevant results even if exact keywords don’t match
|
148 |
-
- **Clustering & topic discovery** – Group related items automatically
|
149 |
-
- **Recommendations** – Suggest similar products, movies, or articles
|
150 |
-
- **Deduplication** – Detect near-duplicate content
|
151 |
-
- **Analogies** – Explore relationships like *"king" – "man" + "woman" ≈ "queen"*
|
152 |
-
|
153 |
-
---
|
154 |
-
|
155 |
-
## 🚀 Try it Yourself
|
156 |
-
- Pick a dataset or create your own by editing the list
|
157 |
-
- Switch models to compare how the embedding space changes
|
158 |
-
- Toggle between 2D and 3D to explore patterns
|
159 |
-
|
160 |
""".strip())
|
161 |
if st.button("⬅ back to demo"):
|
162 |
goto("demo")
|
@@ -184,10 +145,10 @@ with c2:
|
|
184 |
st.session_state.model_name = MODELS[chosen_label]
|
185 |
|
186 |
with c3:
|
187 |
-
#
|
188 |
radio_kwargs = dict(options=["2D", "3D"], horizontal=True, key="proj_mode")
|
189 |
if "proj_mode" not in st.session_state:
|
190 |
-
radio_kwargs["index"] = 1 #
|
191 |
st.radio("projection", **radio_kwargs)
|
192 |
|
193 |
with c4:
|
@@ -311,4 +272,4 @@ with right:
|
|
311 |
)]
|
312 |
)
|
313 |
|
314 |
-
st.plotly_chart(fig, use_container_width=True)
|
|
|
107 |
page = st.query_params.get("page", "demo")
|
108 |
|
109 |
if page == "info":
|
110 |
+
st.title("ℹ about this demo")
|
111 |
st.write("""
|
112 |
+
**embeddings** turn words (or longer text) into numerical vectors.
|
113 |
+
in this vector space, **semantically related** items end up **near** each other.
|
114 |
+
use cases:
|
115 |
+
- semantic search & retrieval
|
116 |
+
- clustering & topic discovery
|
117 |
+
- recommendations & deduplication
|
118 |
+
- measuring similarity and analogies
|
119 |
+
this demo embeds single words with a selectable model, reduces to 2d/3d with pca,
|
120 |
+
and shows how related words appear near each other in the projected space.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
121 |
""".strip())
|
122 |
if st.button("⬅ back to demo"):
|
123 |
goto("demo")
|
|
|
145 |
st.session_state.model_name = MODELS[chosen_label]
|
146 |
|
147 |
with c3:
|
148 |
+
# Default to 3D on first render; single-click thereafter
|
149 |
radio_kwargs = dict(options=["2D", "3D"], horizontal=True, key="proj_mode")
|
150 |
if "proj_mode" not in st.session_state:
|
151 |
+
radio_kwargs["index"] = 1 # 3D default
|
152 |
st.radio("projection", **radio_kwargs)
|
153 |
|
154 |
with c4:
|
|
|
272 |
)]
|
273 |
)
|
274 |
|
275 |
+
st.plotly_chart(fig, use_container_width=True)
|