File size: 13,505 Bytes
606c7cf
 
 
 
 
 
bbc4fdf
606c7cf
 
ae9c474
 
606c7cf
 
8689f6e
 
ae9c474
8689f6e
 
 
ad88378
ae9c474
8689f6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96e4f5d
 
 
8689f6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae9c474
96e4f5d
 
8689f6e
ae9c474
96e4f5d
 
 
 
 
 
 
 
 
 
 
 
ae9c474
 
 
 
 
96e4f5d
ae9c474
 
 
 
96e4f5d
ae9c474
96e4f5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae9c474
96e4f5d
 
 
 
ae9c474
96e4f5d
 
 
ad88378
8689f6e
 
 
ae9c474
8689f6e
96e4f5d
8689f6e
96e4f5d
ae9c474
96e4f5d
 
 
 
 
 
ad88378
96e4f5d
 
 
 
 
8689f6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96e4f5d
8689f6e
 
 
 
 
 
 
 
 
ae9c474
8689f6e
 
 
 
 
 
 
96e4f5d
 
 
 
 
 
 
8689f6e
 
96e4f5d
 
8689f6e
 
 
 
 
 
 
96e4f5d
 
8689f6e
 
 
 
 
 
 
 
 
 
 
 
96e4f5d
 
 
 
 
 
 
 
 
8689f6e
 
 
 
 
 
 
96e4f5d
8689f6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96e4f5d
8689f6e
 
 
 
 
 
 
 
 
 
 
96e4f5d
8689f6e
96e4f5d
ae9c474
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8689f6e
 
 
 
 
ae9c474
 
 
 
 
 
 
8689f6e
 
 
 
 
 
 
 
 
 
 
 
 
ae9c474
96e4f5d
 
bbc4fdf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
title: Marketing Image Generator with AI Review
emoji: 🎨
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
license: mit
short_description: AI marketing image generator with GCP Imagen4 + Gemini 2.5
---

# Marketing Image Generator with Agent Review

A sophisticated AI-powered image generation system that creates high-quality marketing images with automated quality review and refinement. Built on modern AI technologies including Google's Imagen4 and Gemini 2.5 Pro with advanced agent orchestration.

## Features

- **AI-Powered Image Generation**: Create stunning marketing images from text prompts using Google's Imagen4 via MCP server
- **Automated Quality Review**: Intelligent Gemini agent automatically reviews and refines generated images
- **Marketing-Focused**: Optimized for marketing materials, social media, and promotional content
- **Real-time Feedback**: Get instant quality scores and improvement suggestions
- **Professional Workflow**: Streamlined process from concept to final image
- **Download & Share**: Easy export of generated images in multiple formats

## Quick Start

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd MarketingImageGenerator
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up Google Cloud authentication**
   ```bash
   # For Hugging Face deployment, set these as secrets:
   # GOOGLE_API_KEY_1 through GOOGLE_API_KEY_6
   # For local development, use .env file
   ```

4. **Run the Gradio app**
   ```bash
   python app.py
   ```

5. **Access the web interface**
   ```
   http://localhost:7860
   ```

## System Architecture

### Core Components

- **Agent 1 (Image Generator)**: Creates images using Google's Imagen4 via MCP server integration
- **Agent 2 (Marketing Reviewer)**: Analyzes image quality and provides marketing-focused feedback using Gemini Vision
- **Orchestrator**: Manages workflow between agents and handles handover
- **Web Interface**: Gradio-based user interface optimized for Hugging Face
- **MCP Server Integration**: Model Context Protocol for seamless Imagen4 access

### System Architecture and Workflow

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    User     β”‚    β”‚  Gradio UI  β”‚    β”‚      AI Agents & Models     β”‚
β”‚             β”‚    β”‚             β”‚    β”‚                             β”‚
β”‚ Image Prompt│───▢│             │───▢│  Agent 1 (Gemini) Drafter   β”‚
β”‚             β”‚    β”‚             β”‚    β”‚                             β”‚
β”‚Reviewer     │───▢│             │───▢│  Agent 2 (Gemini) Marketing β”‚
β”‚Prompt       β”‚    β”‚             β”‚    β”‚  Reviewer                   β”‚
β”‚             β”‚    β”‚             β”‚    β”‚                             β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”‚   Imagen4 (via MCP)     β”‚ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”‚                         β”‚ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”‚  Draft Image Creation   β”‚ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚                             β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”‚  Draft Image Reviewed   β”‚ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β”‚  & Changes Suggested    β”‚ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚             β”‚    β”‚             β”‚    β”‚                             β”‚
β”‚ Image       │◀───│             │◀───│  Final Image Response      β”‚
β”‚ Response    β”‚    β”‚             β”‚    β”‚                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Detailed Workflow:

1. **User Interaction (Left)**:
   - User sends **Image Prompt** (textual description for desired marketing image)
   - User sends **Reviewer Prompt** (instructions/criteria for marketing review)
   - User receives final **Image Response** (generated and reviewed image)

2. **Gradio UI (Center)**:
   - Acts as central interface receiving prompts from user
   - Forwards **Image Prompt** to **Agent 1 (Gemini) Drafter**
   - Forwards **Reviewer Prompt** to **Agent 2 (Gemini) Marketing Reviewer**
   - Receives final **Image Response** from Agent 2 and presents to user

3. **Image Generation and Drafting (Top Right)**:
   - **Agent 1 (Gemini) Drafter**: Receives Image Prompt, orchestrates image generation
   - **Imagen4 (via MCP)**: Agent 1 interacts with Imagen4 through MCP server to create initial image draft

4. **Marketing Review and Refinement (Bottom Right)**:
   - **Agent 2 (Gemini) Marketing Reviewer**: Receives Reviewer Prompt, evaluates generated image against marketing criteria
   - **Draft Image Reviewed and Changes Suggested**: Agent 2's review process output
   - **Iterative Refinement Loop**: Bidirectional feedback between Agent 2 and Imagen4 (via Agent 1) to refine image until it meets marketing standards
   - Final **Image Response** sent back to Gradio UI

### Summary of Flow:
User provides prompts β†’ Gradio UI β†’ Agent 1 drafts image with Imagen4 β†’ Agent 2 reviews and suggests refinements β†’ Iterative refinement loop β†’ Final reviewed image β†’ User receives result

### Technology Stack

- **AI Models**: Google Imagen4 (via MCP), Gemini 2.5 Pro Vision
- **Framework**: Gradio (Web Interface)
- **Orchestration**: Custom agent handover system
- **Deployment**: Hugging Face Spaces
- **Authentication**: Google Cloud API Keys
- **Protocol**: MCP (Model Context Protocol) for Imagen4 integration

### Why A2A Was Not Applied

The system was designed with a **custom handover mechanism** instead of the A2A (Agent-to-Agent) protocol for the following reasons:

1. **Simplified Architecture**: The current two-agent system (generator + reviewer) doesn't require the complexity of full A2A orchestration
2. **Direct Integration**: MCP server provides direct access to Imagen4 without needing agent-to-agent communication protocols
3. **Performance Optimization**: Direct handover between agents reduces latency and eliminates protocol overhead
4. **Deployment Simplicity**: Hugging Face Spaces deployment is more straightforward without A2A dependencies
5. **Resource Efficiency**: Fewer moving parts means better resource utilization in the cloud environment

The system maintains the benefits of multi-agent collaboration while using a more efficient, purpose-built handover system.

## Usage

### Web Interface (Gradio)

1. Access the app on Hugging Face Spaces
2. Enter your marketing image description in the prompt field
3. Select your preferred art style (realistic, artistic, etc.)
4. Configure quality threshold and advanced settings
5. Click "Generate & Review Marketing Image"
6. View the generated image with AI quality analysis and download

### API Usage

```python
import requests

# Generate an image
response = requests.post("http://localhost:8000/generate", json={
    "prompt": "A modern office space with natural lighting",
    "style": "realistic",
    "enable_review": True
})

# Get the generated image and review results
result = response.json()
image_data = result["data"]["image"]["data"]
quality_score = result["data"]["review"]["quality_score"]
```

## Configuration

### Environment Variables

- `GOOGLE_API_KEY_1` through `GOOGLE_API_KEY_6`: Your Google AI API keys (set as Hugging Face secrets)
- `LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR)
- `PORT`: Web server port (default: 8000)
- `STREAMLIT_PORT`: Streamlit port (default: 8501)

### Advanced Settings

- **Quality Threshold**: Minimum quality score for auto-approval
- **Max Iterations**: Maximum refinement attempts
- **Review Settings**: Customize review criteria
- **MCP Configuration**: Imagen4 server settings

## Development

### Project Structure

```
MarketingImageGenerator/
β”œβ”€β”€ README.md              # Project documentation
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ agents/                # AI agents (if needed for local development)
β”œβ”€β”€ tools/                 # Utility tools (if needed)
β”œβ”€β”€ tests/                 # Test suite (if needed)
└── docs/                  # Documentation (if needed)
```

**Note**: The Hugging Face Spaces deployment uses a simplified structure with just the essential files (`README.md`, `app.py`, `requirements.txt`) for optimal deployment performance.

### Running Tests

```bash
# Run all tests
pytest

# Run specific test suite
pytest tests/test_image_generator.py
pytest tests/test_mcp_integration.py
```

### Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request

## Deployment

### Hugging Face Spaces

The application is deployed on Hugging Face Spaces with the following configuration:

- **SDK**: Gradio 5.38.2
- **Python Version**: 3.9+
- **Secrets**: Google API keys configured as HF secrets
- **Auto-deploy**: Enabled for main branch

### Docker

```bash
# Build the image
docker build -t marketing-image-generator .

# Run the container
docker run -p 7860:7860 marketing-image-generator
```

### Kubernetes

```bash
# Deploy to Kubernetes
kubectl apply -f k8s/

# Check deployment status
kubectl get pods -n marketing-image-generator
```

## Monitoring

The system includes comprehensive monitoring:

- **Health Checks**: Automatic service health monitoring
- **Metrics**: Performance and usage metrics via Prometheus
- **Logging**: Structured logging for debugging
- **Alerts**: Automated alerting for issues

Access monitoring dashboards:
- Prometheus: `http://localhost:9090`
- Grafana: `http://localhost:3000`

## Troubleshooting

### Common Issues

1. **API Key Errors**: Ensure your Google API keys are valid and configured as HF secrets
2. **Image Generation Fails**: Check your internet connection and API quotas
3. **Review Not Working**: Verify the Gemini agent is running and configured correctly
4. **MCP Connection Issues**: Check Imagen4 server connectivity and configuration

### Content Policy & Brand Restrictions

Google's AI models have built-in safety guardrails that may cause timeouts or rejections for certain content types:

#### 🚫 **Highly Restricted Content** (Likely to cause stalls/timeouts):
- **Political Figures**: Named world leaders, politicians (e.g., "Putin", "Zelensky", "Biden")
- **Political Buildings**: Government buildings like "10 Downing Street", "White House"
- **Geopolitical Content**: War, conflict, or sensitive international relations
- **Financial Institution Brands**: Major banks like "HSBC", "Bank of America", "JPMorgan"

#### ⚠️ **Moderately Restricted Content** (May cause delays):
- **Regulated Industries**: Healthcare, pharmaceutical, financial services
- **Some Corporate Brands**: Varies by sector and brand sensitivity

#### βœ… **Generally Permitted Content**:
- **Technology Brands**: "Cognizant", "Microsoft", "IBM", "Accenture"
- **Generic Business**: "Professional office", "corporate environment"
- **Non-branded Content**: Generic descriptions without specific brand names

#### πŸ”§ **Workarounds for Restricted Content**:

**Instead of**: `"Professional boardroom with HSBC signage"`  
**Use**: `"Professional boardroom with international banking corporation signage in red and white colors"`

**Instead of**: `"Meeting with political leaders"`  
**Use**: `"Meeting with business executives in government-style building"`

**Strategy**: Move brand-specific requirements to **Review Guidelines** instead of the main prompt:
- **Main Prompt**: `"Professional corporate environment"`
- **Review Guidelines**: `"Ensure branding reflects HSBC corporate colors (red and white)"`

This approach bypasses content filters while still providing guidance for review.

### Debug Mode

Enable debug logging by setting `LOG_LEVEL=DEBUG` in your environment variables.

### Content Policy Testing

Use the included diagnostic scripts to test content restrictions:
- `debug_hsbc_prompt.py` - Test financial brand restrictions
- `test_cognizant_brand.py` - Test tech brand accessibility
- `test_brand_workaround.py` - Test workaround strategies

### Support

For issues and questions:
- Check the documentation in `/docs`
- Review the troubleshooting guide
- Open an issue on GitHub

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- Google AI for Imagen4 and Gemini 2.5 Pro technologies
- Hugging Face for the deployment platform
- Gradio for the web interface framework
- The open-source community for various dependencies