Spaces:

CognizantAI
/

marketing-image-generator

Running

App Files Files Community

marketing-image-generator / README.md

Noo88ear

Update README.md

bbc4fdf verified 3 months ago

preview code

raw

history blame

13.5 kB

metadata

title: Marketing Image Generator with AI Review
emoji: 🎨
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
license: mit
short_description: AI marketing image generator with GCP Imagen4 + Gemini 2.5

Marketing Image Generator with Agent Review

A sophisticated AI-powered image generation system that creates high-quality marketing images with automated quality review and refinement. Built on modern AI technologies including Google's Imagen4 and Gemini 2.5 Pro with advanced agent orchestration.

Features

AI-Powered Image Generation: Create stunning marketing images from text prompts using Google's Imagen4 via MCP server
Automated Quality Review: Intelligent Gemini agent automatically reviews and refines generated images
Marketing-Focused: Optimized for marketing materials, social media, and promotional content
Real-time Feedback: Get instant quality scores and improvement suggestions
Professional Workflow: Streamlined process from concept to final image
Download & Share: Easy export of generated images in multiple formats

Quick Start

Clone the repository

git clone <repository-url>
cd MarketingImageGenerator

Install dependencies
```
pip install -r requirements.txt
```

Set up Google Cloud authentication

# For Hugging Face deployment, set these as secrets:
# GOOGLE_API_KEY_1 through GOOGLE_API_KEY_6
# For local development, use .env file

Run the Gradio app
```
python app.py
```
Access the web interface
```
http://localhost:7860
```

System Architecture

Core Components

Agent 1 (Image Generator): Creates images using Google's Imagen4 via MCP server integration
Agent 2 (Marketing Reviewer): Analyzes image quality and provides marketing-focused feedback using Gemini Vision
Orchestrator: Manages workflow between agents and handles handover
Web Interface: Gradio-based user interface optimized for Hugging Face
MCP Server Integration: Model Context Protocol for seamless Imagen4 access

System Architecture and Workflow

┌─────────────┐    ┌─────────────┐    ┌─────────────────────────────┐
│    User     │    │  Gradio UI  │    │      AI Agents & Models     │
│             │    │             │    │                             │
│ Image Prompt│───▶│             │───▶│  Agent 1 (Gemini) Drafter   │
│             │    │             │    │                             │
│Reviewer     │───▶│             │───▶│  Agent 2 (Gemini) Marketing │
│Prompt       │    │             │    │  Reviewer                   │
│             │    │             │    │                             │
│             │    │             │    │  ┌─────────────────────────┐ │
│             │    │             │    │  │   Imagen4 (via MCP)     │ │
│             │    │             │    │  │                         │ │
│             │    │             │    │  │  Draft Image Creation   │ │
│             │    │             │    │  └─────────────────────────┘ │
│             │    │             │    │                             │
│             │    │             │    │  ┌─────────────────────────┐ │
│             │    │             │    │  │  Draft Image Reviewed   │ │
│             │    │             │    │  │  & Changes Suggested    │ │
│             │    │             │    │  └─────────────────────────┘ │
│             │    │             │    │                             │
│ Image       │◀───│             │◀───│  Final Image Response      │
│ Response    │    │             │    │                             │
└─────────────┘    └─────────────┘    └─────────────────────────────┘

Detailed Workflow:

User Interaction (Left):
- User sends Image Prompt (textual description for desired marketing image)
- User sends Reviewer Prompt (instructions/criteria for marketing review)
- User receives final Image Response (generated and reviewed image)
Gradio UI (Center):
- Acts as central interface receiving prompts from user
- Forwards Image Prompt to Agent 1 (Gemini) Drafter
- Forwards Reviewer Prompt to Agent 2 (Gemini) Marketing Reviewer
- Receives final Image Response from Agent 2 and presents to user
Image Generation and Drafting (Top Right):
- Agent 1 (Gemini) Drafter: Receives Image Prompt, orchestrates image generation
- Imagen4 (via MCP): Agent 1 interacts with Imagen4 through MCP server to create initial image draft
Marketing Review and Refinement (Bottom Right):
- Agent 2 (Gemini) Marketing Reviewer: Receives Reviewer Prompt, evaluates generated image against marketing criteria
- Draft Image Reviewed and Changes Suggested: Agent 2's review process output
- Iterative Refinement Loop: Bidirectional feedback between Agent 2 and Imagen4 (via Agent 1) to refine image until it meets marketing standards
- Final Image Response sent back to Gradio UI

Summary of Flow:

User provides prompts → Gradio UI → Agent 1 drafts image with Imagen4 → Agent 2 reviews and suggests refinements → Iterative refinement loop → Final reviewed image → User receives result

Technology Stack

AI Models: Google Imagen4 (via MCP), Gemini 2.5 Pro Vision
Framework: Gradio (Web Interface)
Orchestration: Custom agent handover system
Deployment: Hugging Face Spaces
Authentication: Google Cloud API Keys
Protocol: MCP (Model Context Protocol) for Imagen4 integration

Why A2A Was Not Applied

The system was designed with a custom handover mechanism instead of the A2A (Agent-to-Agent) protocol for the following reasons:

Simplified Architecture: The current two-agent system (generator + reviewer) doesn't require the complexity of full A2A orchestration
Direct Integration: MCP server provides direct access to Imagen4 without needing agent-to-agent communication protocols
Performance Optimization: Direct handover between agents reduces latency and eliminates protocol overhead
Deployment Simplicity: Hugging Face Spaces deployment is more straightforward without A2A dependencies
Resource Efficiency: Fewer moving parts means better resource utilization in the cloud environment

The system maintains the benefits of multi-agent collaboration while using a more efficient, purpose-built handover system.

Usage

Web Interface (Gradio)

Access the app on Hugging Face Spaces
Enter your marketing image description in the prompt field
Select your preferred art style (realistic, artistic, etc.)
Configure quality threshold and advanced settings
Click "Generate & Review Marketing Image"
View the generated image with AI quality analysis and download

API Usage

import requests

# Generate an image
response = requests.post("http://localhost:8000/generate", json={
    "prompt": "A modern office space with natural lighting",
    "style": "realistic",
    "enable_review": True
})

# Get the generated image and review results
result = response.json()
image_data = result["data"]["image"]["data"]
quality_score = result["data"]["review"]["quality_score"]

Configuration

Environment Variables

GOOGLE_API_KEY_1 through GOOGLE_API_KEY_6: Your Google AI API keys (set as Hugging Face secrets)
LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR)
PORT: Web server port (default: 8000)
STREAMLIT_PORT: Streamlit port (default: 8501)

Advanced Settings

Quality Threshold: Minimum quality score for auto-approval
Max Iterations: Maximum refinement attempts
Review Settings: Customize review criteria
MCP Configuration: Imagen4 server settings

Development

Project Structure

MarketingImageGenerator/
├── README.md              # Project documentation
├── app.py                 # Main Gradio application
├── requirements.txt       # Python dependencies
├── agents/                # AI agents (if needed for local development)
├── tools/                 # Utility tools (if needed)
├── tests/                 # Test suite (if needed)
└── docs/                  # Documentation (if needed)

Note: The Hugging Face Spaces deployment uses a simplified structure with just the essential files (README.md, app.py, requirements.txt) for optimal deployment performance.

Running Tests

# Run all tests
pytest

# Run specific test suite
pytest tests/test_image_generator.py
pytest tests/test_mcp_integration.py

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Deployment

Hugging Face Spaces

The application is deployed on Hugging Face Spaces with the following configuration:

SDK: Gradio 5.38.2
Python Version: 3.9+
Secrets: Google API keys configured as HF secrets
Auto-deploy: Enabled for main branch

Docker

# Build the image
docker build -t marketing-image-generator .

# Run the container
docker run -p 7860:7860 marketing-image-generator

Kubernetes

# Deploy to Kubernetes
kubectl apply -f k8s/

# Check deployment status
kubectl get pods -n marketing-image-generator

Monitoring

The system includes comprehensive monitoring:

Health Checks: Automatic service health monitoring
Metrics: Performance and usage metrics via Prometheus
Logging: Structured logging for debugging
Alerts: Automated alerting for issues

Access monitoring dashboards:

Prometheus: http://localhost:9090
Grafana: http://localhost:3000

Troubleshooting

Common Issues

API Key Errors: Ensure your Google API keys are valid and configured as HF secrets
Image Generation Fails: Check your internet connection and API quotas
Review Not Working: Verify the Gemini agent is running and configured correctly
MCP Connection Issues: Check Imagen4 server connectivity and configuration

Content Policy & Brand Restrictions

Google's AI models have built-in safety guardrails that may cause timeouts or rejections for certain content types:

🚫 Highly Restricted Content (Likely to cause stalls/timeouts):

Political Figures: Named world leaders, politicians (e.g., "Putin", "Zelensky", "Biden")
Political Buildings: Government buildings like "10 Downing Street", "White House"
Geopolitical Content: War, conflict, or sensitive international relations
Financial Institution Brands: Major banks like "HSBC", "Bank of America", "JPMorgan"

⚠️ Moderately Restricted Content (May cause delays):

Regulated Industries: Healthcare, pharmaceutical, financial services
Some Corporate Brands: Varies by sector and brand sensitivity

✅ Generally Permitted Content:

Technology Brands: "Cognizant", "Microsoft", "IBM", "Accenture"
Generic Business: "Professional office", "corporate environment"
Non-branded Content: Generic descriptions without specific brand names

🔧 Workarounds for Restricted Content:

Instead of: "Professional boardroom with HSBC signage"
Use: "Professional boardroom with international banking corporation signage in red and white colors"

Instead of: "Meeting with political leaders"
Use: "Meeting with business executives in government-style building"

Strategy: Move brand-specific requirements to Review Guidelines instead of the main prompt:

Main Prompt: "Professional corporate environment"
Review Guidelines: "Ensure branding reflects HSBC corporate colors (red and white)"

This approach bypasses content filters while still providing guidance for review.

Debug Mode

Enable debug logging by setting LOG_LEVEL=DEBUG in your environment variables.

Content Policy Testing

Use the included diagnostic scripts to test content restrictions:

debug_hsbc_prompt.py - Test financial brand restrictions
test_cognizant_brand.py - Test tech brand accessibility
test_brand_workaround.py - Test workaround strategies

Support

For issues and questions:

Check the documentation in /docs
Review the troubleshooting guide
Open an issue on GitHub

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Google AI for Imagen4 and Gemini 2.5 Pro technologies
Hugging Face for the deployment platform
Gradio for the web interface framework
The open-source community for various dependencies