berndf commited on
Commit
b0f2134
·
verified ·
1 Parent(s): b27e307

added description

Browse files
Files changed (1) hide show
  1. app.py +48 -11
app.py CHANGED
@@ -107,18 +107,55 @@ def goto(page: str):
107
  page = st.query_params.get("page", "demo")
108
 
109
  if page == "info":
110
- st.title("ℹ about this demo")
111
  st.write("""
112
- **embeddings** turn words (or longer text) into numerical vectors.
113
- in this vector space, **semantically related** items end up **near** each other.
114
- use cases:
115
- - semantic search & retrieval
116
- - clustering & topic discovery
117
- - recommendations & deduplication
118
- - measuring similarity and analogies
119
- this demo embeds single words with a selectable model, reduces to 2d/3d with pca,
120
- and shows how related words appear near each other in the projected space.
121
- """.strip())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  if st.button("⬅ back to demo"):
123
  goto("demo")
124
  st.stop()
 
107
  page = st.query_params.get("page", "demo")
108
 
109
  if page == "info":
 
110
  st.write("""
111
+ # 🧠 Embedding Visualizer About
112
+
113
+ This demo shows how **vector embeddings** can capture the meaning of words and place them in a **numerical space** where related items appear close together.
114
+
115
+ You can:
116
+ - Choose from predefined or mixed datasets (e.g., countries, animals, actors, sports)
117
+ - Select different embedding models to compare results
118
+ - Switch between 2D and 3D visualizations
119
+ - Edit the list of words directly and see the updated projection instantly
120
+
121
+ ---
122
+
123
+ ## 📌 What are Vector Embeddings?
124
+ A **vector embedding** is a way of representing text (words, sentences, or documents) as a list of numbers — a point in a high-dimensional space.
125
+ These numbers are produced by a trained **language model** that captures semantic meaning.
126
+
127
+ In this space:
128
+ - Words with **similar meanings** end up **near each other**
129
+ - Dissimilar words are placed **far apart**
130
+ - The model can detect relationships and groupings that aren’t obvious from spelling or grammar alone
131
+
132
+ Example:
133
+ `"cat"` and `"dog"` will likely be closer to each other than to `"table"`, because the model “knows” they are both animals.
134
+
135
+ ---
136
+
137
+ ## 🔍 How the Demo Works
138
+ 1. **Embedding step** – Each word is converted into a high-dimensional vector (e.g., 384, 768, or 1024 dimensions depending on the model).
139
+ 2. **Dimensionality reduction** – Since humans can’t visualize hundreds of dimensions, the vectors are projected to 2D or 3D using **PCA** (Principal Component Analysis).
140
+ 3. **Visualization** – The projected points are plotted, with labels showing the original words.
141
+ You can rotate the 3D view to explore groupings.
142
+
143
+ ---
144
+
145
+ ## 💡 Typical Applications of Embeddings
146
+ - **Semantic search** – Find relevant results even if exact keywords don’t match
147
+ - **Clustering & topic discovery** – Group related items automatically
148
+ - **Recommendations** – Suggest similar products, movies, or articles
149
+ - **Deduplication** – Detect near-duplicate content
150
+ - **Analogies** – Explore relationships like *"king" – "man" + "woman" ≈ "queen"*
151
+
152
+ ---
153
+
154
+ ## 🚀 Try it Yourself
155
+ - Pick a dataset or create your own by editing the list
156
+ - Switch models to compare how the embedding space changes
157
+ - Toggle between 2D and 3D to explore patterns
158
+ """.strip())
159
  if st.button("⬅ back to demo"):
160
  goto("demo")
161
  st.stop()