Noo88ear commited on
Commit
96e4f5d
Β·
verified Β·
1 Parent(s): 22a36c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -32
README.md CHANGED
@@ -15,8 +15,8 @@ A sophisticated AI-powered image generation system that creates high-quality mar
15
 
16
  ## Features
17
 
18
- - **AI-Powered Image Generation**: Create stunning marketing images from text prompts using Google's Imagen3
19
- - **Automated Quality Review**: Intelligent agents automatically review and refine generated images
20
  - **Marketing-Focused**: Optimized for marketing materials, social media, and promotional content
21
  - **Real-time Feedback**: Get instant quality scores and improvement suggestions
22
  - **Professional Workflow**: Streamlined process from concept to final image
@@ -37,8 +37,9 @@ A sophisticated AI-powered image generation system that creates high-quality mar
37
 
38
  3. **Set up Google Cloud authentication**
39
  ```bash
40
- export GOOGLE_SERVICE_ACCOUNT_JSON='{"type":"service_account",...}'
41
- # Or set GOOGLE_API_KEY for Google AI Studio
 
42
  ```
43
 
44
  4. **Run the Gradio app**
@@ -55,19 +56,85 @@ A sophisticated AI-powered image generation system that creates high-quality mar
55
 
56
  ### Core Components
57
 
58
- - **Image Generator Agent**: Creates images using Google's Imagen3
59
- - **Review Agent**: Analyzes image quality and provides feedback
60
- - **Orchestrator**: Manages workflow between agents
61
  - **Web Interface**: Gradio-based user interface optimized for Hugging Face
62
- - **Agent Integration**: Direct A2A protocol communication between agents
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  ### Technology Stack
65
 
66
- - **AI Models**: Google Imagen3, Gemini Vision
67
  - **Framework**: Gradio (Web Interface)
68
- - **Orchestration**: Integrated A2A agent protocol
69
  - **Deployment**: Hugging Face Spaces
70
- - **Authentication**: Google Cloud Service Account
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ## Usage
73
 
@@ -102,8 +169,7 @@ quality_score = result["data"]["review"]["quality_score"]
102
 
103
  ### Environment Variables
104
 
105
- - `GOOGLE_API_KEY`: Your Google AI API key
106
- - `IMAGEN3_API_KEY`: Imagen3 API key (if different)
107
  - `LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR)
108
  - `PORT`: Web server port (default: 8000)
109
  - `STREAMLIT_PORT`: Streamlit port (default: 8501)
@@ -113,6 +179,7 @@ quality_score = result["data"]["review"]["quality_score"]
113
  - **Quality Threshold**: Minimum quality score for auto-approval
114
  - **Max Iterations**: Maximum refinement attempts
115
  - **Review Settings**: Customize review criteria
 
116
 
117
  ## Development
118
 
@@ -120,18 +187,17 @@ quality_score = result["data"]["review"]["quality_score"]
120
 
121
  ```
122
  MarketingImageGenerator/
123
- β”œβ”€β”€ agents/ # AI agents
124
- β”‚ β”œβ”€β”€ generator/ # Image generation agent
125
- β”‚ β”œβ”€β”€ reviewer/ # Quality review agent
126
- β”‚ └── orchestrator/ # Workflow orchestration
127
- β”œβ”€β”€ api/ # FastAPI endpoints
128
- β”œβ”€β”€ web/ # Streamlit interface
129
- β”œβ”€β”€ tools/ # Utility tools
130
- β”œβ”€β”€ tests/ # Test suite
131
- β”œβ”€β”€ docs/ # Documentation
132
- └── deployment/ # Docker & K8s configs
133
  ```
134
 
 
 
135
  ### Running Tests
136
 
137
  ```bash
@@ -139,8 +205,8 @@ MarketingImageGenerator/
139
  pytest
140
 
141
  # Run specific test suite
142
- pytest tests/test_generation.py
143
- pytest tests/test_review.py
144
  ```
145
 
146
  ### Contributing
@@ -153,6 +219,15 @@ pytest tests/test_review.py
153
 
154
  ## Deployment
155
 
 
 
 
 
 
 
 
 
 
156
  ### Docker
157
 
158
  ```bash
@@ -160,7 +235,7 @@ pytest tests/test_review.py
160
  docker build -t marketing-image-generator .
161
 
162
  # Run the container
163
- docker run -p 8000:8000 -p 8501:8501 marketing-image-generator
164
  ```
165
 
166
  ### Kubernetes
@@ -178,7 +253,7 @@ kubectl get pods -n marketing-image-generator
178
  The system includes comprehensive monitoring:
179
 
180
  - **Health Checks**: Automatic service health monitoring
181
- - **Metrics**: Performance and usage metrics
182
  - **Logging**: Structured logging for debugging
183
  - **Alerts**: Automated alerting for issues
184
 
@@ -190,9 +265,10 @@ Access monitoring dashboards:
190
 
191
  ### Common Issues
192
 
193
- 1. **API Key Errors**: Ensure your Google API key is valid and has the necessary permissions
194
  2. **Image Generation Fails**: Check your internet connection and API quotas
195
- 3. **Review Not Working**: Verify the review agent is running and configured correctly
 
196
 
197
  ### Debug Mode
198
 
@@ -212,6 +288,6 @@ This project is licensed under the MIT License - see the LICENSE file for detail
212
  ## Acknowledgments
213
 
214
  - Google AI for Imagen3 and Gemini technologies
215
- - Streamlit for the web interface framework
216
- - FastAPI for the API framework
217
- - The open-source community for various dependencies
 
15
 
16
  ## Features
17
 
18
+ - **AI-Powered Image Generation**: Create stunning marketing images from text prompts using Google's Imagen3 via MCP server
19
+ - **Automated Quality Review**: Intelligent Gemini agent automatically reviews and refines generated images
20
  - **Marketing-Focused**: Optimized for marketing materials, social media, and promotional content
21
  - **Real-time Feedback**: Get instant quality scores and improvement suggestions
22
  - **Professional Workflow**: Streamlined process from concept to final image
 
37
 
38
  3. **Set up Google Cloud authentication**
39
  ```bash
40
+ # For Hugging Face deployment, set these as secrets:
41
+ # GOOGLE_API_KEY_1 through GOOGLE_API_KEY_6
42
+ # For local development, use .env file
43
  ```
44
 
45
  4. **Run the Gradio app**
 
56
 
57
  ### Core Components
58
 
59
+ - **Agent 1 (Image Generator)**: Creates images using Google's Imagen3 via MCP server integration
60
+ - **Agent 2 (Marketing Reviewer)**: Analyzes image quality and provides marketing-focused feedback using Gemini Vision
61
+ - **Orchestrator**: Manages workflow between agents and handles handover
62
  - **Web Interface**: Gradio-based user interface optimized for Hugging Face
63
+ - **MCP Server Integration**: Model Context Protocol for seamless Imagen3 access
64
+
65
+ ### System Architecture and Workflow
66
+
67
+ ```
68
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
69
+ β”‚ User β”‚ β”‚ Gradio UI β”‚ β”‚ AI Agents & Models β”‚
70
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
71
+ β”‚ Image Prompt│───▢│ │───▢│ Agent 1 (Gemini) Drafter β”‚
72
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
73
+ β”‚Reviewer │───▢│ │───▢│ Agent 2 (Gemini) Marketing β”‚
74
+ β”‚Prompt β”‚ β”‚ β”‚ β”‚ Reviewer β”‚
75
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
76
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
77
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Ag1: Imagen3 (via MCP) β”‚β”‚
78
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β”‚
79
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Draft Image Creation β”‚β”‚
80
+ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
81
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
82
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
83
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Ag2;Draft Image Reviewed β”‚β”‚
84
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ & Changes Suggested β”‚β”‚
85
+ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
86
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
87
+ β”‚ Image │◀───│ │◀───│ Final Image Response β”‚
88
+ β”‚ Response β”‚ β”‚ β”‚ β”‚ β”‚
89
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
90
+ ```
91
+
92
+ ### Detailed Workflow:
93
+
94
+ 1. **User Interaction (Left)**:
95
+ - User sends **Image Prompt** (textual description for desired marketing image)
96
+ - User sends **Reviewer Prompt** (instructions/criteria for marketing review)
97
+ - User receives final **Image Response** (generated and reviewed image)
98
+
99
+ 2. **Gradio UI (Center)**:
100
+ - Acts as central interface receiving prompts from user
101
+ - Forwards **Image Prompt** to **Agent 1 (Gemini) Drafter**
102
+ - Forwards **Reviewer Prompt** to **Agent 2 (Gemini) Marketing Reviewer**
103
+ - Receives final **Image Response** from Agent 2 and presents to user
104
+
105
+ 3. **Image Generation and Drafting (Top Right)**:
106
+ - **Agent 1 (Gemini) Drafter**: Receives Image Prompt, orchestrates image generation
107
+ - **Imagen3 (via MCP)**: Agent 1 interacts with Imagen3 through MCP server to create initial image draft
108
+
109
+ 4. **Marketing Review and Refinement (Bottom Right)**:
110
+ - **Agent 2 (Gemini) Marketing Reviewer**: Receives Reviewer Prompt, evaluates generated image against marketing criteria
111
+ - **Draft Image Reviewed and Changes Suggested**: Agent 2's review process output
112
+ - **Iterative Refinement Loop**: Bidirectional feedback between Agent 2 and Imagen3 (via Agent 1) to refine image until it meets marketing standards
113
+ - Final **Image Response** sent back to Gradio UI
114
+
115
+ ### Summary of Flow:
116
+ User provides prompts β†’ Gradio UI β†’ Agent 1 drafts image with Imagen3 β†’ Agent 2 reviews and suggests refinements β†’ Iterative refinement loop β†’ Final reviewed image β†’ User receives result
117
 
118
  ### Technology Stack
119
 
120
+ - **AI Models**: Google Imagen3 (via MCP), Gemini Vision
121
  - **Framework**: Gradio (Web Interface)
122
+ - **Orchestration**: Custom agent handover system
123
  - **Deployment**: Hugging Face Spaces
124
+ - **Authentication**: Google Cloud API Keys
125
+ - **Protocol**: MCP (Model Context Protocol) for Imagen3 integration
126
+
127
+ ### Why A2A Was Not Applied
128
+
129
+ The system was designed with a **custom handover mechanism** instead of the A2A (Agent-to-Agent) protocol for the following reasons:
130
+
131
+ 1. **Simplified Architecture**: The current two-agent system (generator + reviewer) doesn't require the complexity of full A2A orchestration
132
+ 2. **Direct Integration**: MCP server provides direct access to Imagen3 without needing agent-to-agent communication protocols
133
+ 3. **Performance Optimization**: Direct handover between agents reduces latency and eliminates protocol overhead
134
+ 4. **Deployment Simplicity**: Hugging Face Spaces deployment is more straightforward without A2A dependencies
135
+ 5. **Resource Efficiency**: Fewer moving parts means better resource utilization in the cloud environment
136
+
137
+ The system maintains the benefits of multi-agent collaboration while using a more efficient, purpose-built handover system.
138
 
139
  ## Usage
140
 
 
169
 
170
  ### Environment Variables
171
 
172
+ - `GOOGLE_API_KEY_1` through `GOOGLE_API_KEY_6`: Your Google AI API keys (set as Hugging Face secrets)
 
173
  - `LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR)
174
  - `PORT`: Web server port (default: 8000)
175
  - `STREAMLIT_PORT`: Streamlit port (default: 8501)
 
179
  - **Quality Threshold**: Minimum quality score for auto-approval
180
  - **Max Iterations**: Maximum refinement attempts
181
  - **Review Settings**: Customize review criteria
182
+ - **MCP Configuration**: Imagen3 server settings
183
 
184
  ## Development
185
 
 
187
 
188
  ```
189
  MarketingImageGenerator/
190
+ β”œβ”€β”€ README.md # Project documentation
191
+ β”œβ”€β”€ app.py # Main Gradio application
192
+ β”œβ”€β”€ requirements.txt # Python dependencies
193
+ β”œβ”€β”€ agents/ # AI agents (if needed for local development)
194
+ β”œβ”€β”€ tools/ # Utility tools (if needed)
195
+ β”œβ”€β”€ tests/ # Test suite (if needed)
196
+ └── docs/ # Documentation (if needed)
 
 
 
197
  ```
198
 
199
+ **Note**: The Hugging Face Spaces deployment uses a simplified structure with just the essential files (`README.md`, `app.py`, `requirements.txt`) for optimal deployment performance.
200
+
201
  ### Running Tests
202
 
203
  ```bash
 
205
  pytest
206
 
207
  # Run specific test suite
208
+ pytest tests/test_image_generator.py
209
+ pytest tests/test_mcp_integration.py
210
  ```
211
 
212
  ### Contributing
 
219
 
220
  ## Deployment
221
 
222
+ ### Hugging Face Spaces
223
+
224
+ The application is deployed on Hugging Face Spaces with the following configuration:
225
+
226
+ - **SDK**: Gradio 5.38.2
227
+ - **Python Version**: 3.9+
228
+ - **Secrets**: Google API keys configured as HF secrets
229
+ - **Auto-deploy**: Enabled for main branch
230
+
231
  ### Docker
232
 
233
  ```bash
 
235
  docker build -t marketing-image-generator .
236
 
237
  # Run the container
238
+ docker run -p 7860:7860 marketing-image-generator
239
  ```
240
 
241
  ### Kubernetes
 
253
  The system includes comprehensive monitoring:
254
 
255
  - **Health Checks**: Automatic service health monitoring
256
+ - **Metrics**: Performance and usage metrics via Prometheus
257
  - **Logging**: Structured logging for debugging
258
  - **Alerts**: Automated alerting for issues
259
 
 
265
 
266
  ### Common Issues
267
 
268
+ 1. **API Key Errors**: Ensure your Google API keys are valid and configured as HF secrets
269
  2. **Image Generation Fails**: Check your internet connection and API quotas
270
+ 3. **Review Not Working**: Verify the Gemini agent is running and configured correctly
271
+ 4. **MCP Connection Issues**: Check Imagen3 server connectivity and configuration
272
 
273
  ### Debug Mode
274
 
 
288
  ## Acknowledgments
289
 
290
  - Google AI for Imagen3 and Gemini technologies
291
+ - Hugging Face for the deployment platform
292
+ - Gradio for the web interface framework
293
+ - The open-source community for various dependencies