berndf commited on
Commit
b27e307
·
verified ·
1 Parent(s): 1adab1d
Files changed (1) hide show
  1. app.py +13 -52
app.py CHANGED
@@ -107,56 +107,17 @@ def goto(page: str):
107
  page = st.query_params.get("page", "demo")
108
 
109
  if page == "info":
110
- st.title("about this demo")
111
  st.write("""
112
- # 🧠 Embedding Visualizer About
113
-
114
- This demo shows how **vector embeddings** can capture the meaning of words and place them in a **numerical space** where related items appear close together.
115
-
116
- You can:
117
- - Choose from predefined or mixed datasets (e.g., countries, animals, actors, sports)
118
- - Select different embedding models to compare results
119
- - Switch between 2D and 3D visualizations
120
- - Edit the list of words directly and see the updated projection instantly
121
-
122
- ---
123
-
124
- ## 📌 What are Vector Embeddings?
125
- A **vector embedding** is a way of representing text (words, sentences, or documents) as a list of numbers — a point in a high-dimensional space.
126
- These numbers are produced by a trained **language model** that captures semantic meaning.
127
-
128
- In this space:
129
- - Words with **similar meanings** end up **near each other**
130
- - Dissimilar words are placed **far apart**
131
- - The model can detect relationships and groupings that aren’t obvious from spelling or grammar alone
132
-
133
- Example:
134
- `"cat"` and `"dog"` will likely be closer to each other than to `"table"`, because the model “knows” they are both animals.
135
-
136
- ---
137
-
138
- ## 🔍 How the Demo Works
139
- 1. **Embedding step** – Each word is converted into a high-dimensional vector (e.g., 384, 768, or 1024 dimensions depending on the model).
140
- 2. **Dimensionality reduction** – Since humans can’t visualize hundreds of dimensions, the vectors are projected to 2D or 3D using **PCA** (Principal Component Analysis).
141
- 3. **Visualization** – The projected points are plotted, with labels showing the original words.
142
- You can rotate the 3D view to explore groupings.
143
-
144
- ---
145
-
146
- ## 💡 Typical Applications of Embeddings
147
- - **Semantic search** – Find relevant results even if exact keywords don’t match
148
- - **Clustering & topic discovery** – Group related items automatically
149
- - **Recommendations** – Suggest similar products, movies, or articles
150
- - **Deduplication** – Detect near-duplicate content
151
- - **Analogies** – Explore relationships like *"king" – "man" + "woman" ≈ "queen"*
152
-
153
- ---
154
-
155
- ## 🚀 Try it Yourself
156
- - Pick a dataset or create your own by editing the list
157
- - Switch models to compare how the embedding space changes
158
- - Toggle between 2D and 3D to explore patterns
159
-
160
  """.strip())
161
  if st.button("⬅ back to demo"):
162
  goto("demo")
@@ -184,10 +145,10 @@ with c2:
184
  st.session_state.model_name = MODELS[chosen_label]
185
 
186
  with c3:
187
- # Single-click fix: stable key and only set index on first render
188
  radio_kwargs = dict(options=["2D", "3D"], horizontal=True, key="proj_mode")
189
  if "proj_mode" not in st.session_state:
190
- radio_kwargs["index"] = 1 # default to 3D initially
191
  st.radio("projection", **radio_kwargs)
192
 
193
  with c4:
@@ -311,4 +272,4 @@ with right:
311
  )]
312
  )
313
 
314
- st.plotly_chart(fig, use_container_width=True)
 
107
  page = st.query_params.get("page", "demo")
108
 
109
  if page == "info":
110
+ st.title("about this demo")
111
  st.write("""
112
+ **embeddings** turn words (or longer text) into numerical vectors.
113
+ in this vector space, **semantically related** items end up **near** each other.
114
+ use cases:
115
+ - semantic search & retrieval
116
+ - clustering & topic discovery
117
+ - recommendations & deduplication
118
+ - measuring similarity and analogies
119
+ this demo embeds single words with a selectable model, reduces to 2d/3d with pca,
120
+ and shows how related words appear near each other in the projected space.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  """.strip())
122
  if st.button("⬅ back to demo"):
123
  goto("demo")
 
145
  st.session_state.model_name = MODELS[chosen_label]
146
 
147
  with c3:
148
+ # Default to 3D on first render; single-click thereafter
149
  radio_kwargs = dict(options=["2D", "3D"], horizontal=True, key="proj_mode")
150
  if "proj_mode" not in st.session_state:
151
+ radio_kwargs["index"] = 1 # 3D default
152
  st.radio("projection", **radio_kwargs)
153
 
154
  with c4:
 
272
  )]
273
  )
274
 
275
+ st.plotly_chart(fig, use_container_width=True)