Spaces:

SuperDan
/

Feelings_to_Emoji

Sleeping

App Files Files Community

Dan Mo commited on Apr 13

Commit

975f207

1 Parent(s): 712ecb4

Add comprehensive technical reference documentation for the Feelings to Emoji application

Browse files

Files changed (1) hide show

REFERENCE.md +131 -0

REFERENCE.md ADDED Viewed

	@@ -0,0 +1,131 @@

+# Feelings to Emoji: Technical Reference
+This document provides technical details about the implementation of the Feelings to Emoji application.
+## Project Structure
+The application is organized into several Python modules:
+- `app.py` - Main application file with Gradio interface
+- `emoji_processor.py` - Core processing logic for emoji matching
+- `config.py` - Configuration settings
+- `utils.py` - Utility functions
+- `generate_embeddings.py` - Standalone tool to pre-generate embeddings
+## Embedding Models
+The system uses the following sentence embedding models from the Sentence Transformers library:
+| Model Key | Model ID | Size | Description |
+|-----------|----------|------|-------------|
+| mpnet | all-mpnet-base-v2 | 110M | Balanced, great general-purpose model |
+| gte | thenlper/gte-large | 335M | Context-rich, good for emotion & nuance |
+| bge | BAAI/bge-large-en-v1.5 | 350M | Tuned for ranking & high-precision similarity |
+## Emoji Matching Algorithm
+The application uses cosine similarity between sentence embeddings to match text with emojis:
+1. For each emoji category (emotion and event):
+   - Embed descriptions using the selected model
+   - Calculate cosine similarity between the input text embedding and each emoji description embedding
+   - Return the emoji with the highest similarity score
+2. The embeddings are pre-computed and cached to improve performance:
+   - Stored as pickle files in the `embeddings/` directory
+   - Generated using `generate_embeddings.py`
+   - Loaded at startup to minimize processing time
+## Module Reference
+### `config.py`
+Contains configuration settings including:
+- `CONFIG`: Dictionary with basic application settings (model name, file paths, etc.)
+- `EMBEDDING_MODELS`: Dictionary defining the available embedding models
+### `utils.py`
+Utility functions including:
+- `setup_logging()`: Configures application logging
+- `kitchen_txt_to_dict(filepath)`: Parses emoji dictionary files
+- `save_embeddings_to_pickle(embeddings, filepath)`: Saves embeddings to pickle files
+- `load_embeddings_from_pickle(filepath)`: Loads embeddings from pickle files
+- `get_embeddings_pickle_path(model_id, emoji_type)`: Generates consistent paths for embedding files
+### `emoji_processor.py`
+Core processing logic:
+- `EmojiProcessor`: Main class for emoji matching and processing
+  - `__init__(model_name=None, model_key=None, use_cached_embeddings=True)`: Initializes the processor with a specific model
+  - `load_emoji_dictionaries(emotion_file, item_file)`: Loads emoji dictionaries from text files
+  - `switch_model(model_key)`: Switches to a different embedding model
+  - `sentence_to_emojis(sentence)`: Processes text to find matching emojis and generate mashup
+  - `find_top_emojis(embedding, emoji_embeddings, top_n=1)`: Finds top matching emojis using cosine similarity
+### `app.py`
+Gradio interface:
+- `EmojiMashupApp`: Main application class
+  - `create_interface()`: Creates the Gradio interface
+  - `process_with_model(model_selection, text, use_cached_embeddings)`: Processes text with selected model
+  - `get_random_example()`: Gets a random example sentence for demonstration
+### `generate_embeddings.py`
+Standalone utility to pre-generate embeddings:
+- `generate_embeddings_for_model(model_key, model_info)`: Generates embeddings for a specific model
+- `main()`: Main function that processes all models and saves embeddings
+## Emoji Data Files
+- `google-emoji-kitchen-emotion.txt`: Emotion emojis with descriptions
+- `google-emoji-kitchen-item.txt`: Event/object emojis with descriptions
+- `google-emoji-kitchen-compatible.txt`: Compatibility information for emoji combinations
+## Embedding Cache Structure
+The `embeddings/` directory contains pre-generated embeddings in pickle format:
+- `[model_id]_emotion.pkl`: Embeddings for emotion emojis
+- `[model_id]_event.pkl`: Embeddings for event/object emojis
+## API Usage Examples
+### Using the EmojiProcessor Directly
+```python
+from emoji_processor import EmojiProcessor
+# Initialize with default model (mpnet)
+processor = EmojiProcessor()
+processor.load_emoji_dictionaries()
+# Process a sentence
+emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!")
+print(f"Emotion emoji: {emotion}")
+print(f"Event emoji: {event}")
+# image contains the PIL Image object of the mashup
+```
+### Switching Models
+```python
+# Switch to a different model
+processor.switch_model("gte")
+# Process with the new model
+emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.")
+```
+## Performance Considerations
+- Embedding generation is computationally intensive but only happens once per model
+- Using cached embeddings significantly improves response time
+- Larger models (GTE, BGE) may provide better accuracy but require more resources
+- The MPNet model offers a good balance of performance and accuracy for most use cases