|
--- |
|
title: Marketing Image Generator with AI Review |
|
emoji: π¨ |
|
colourFrom: blue |
|
colourTo: purple |
|
sdk: gradio |
|
sdk_version: 5.39.0 |
|
app_file: app.py |
|
pinned: false |
|
licence: mit |
|
short_description: AI marketing image generator with Imagen4 + Gemini |
|
--- |
|
|
|
# Marketing Image Generator with Agent Review |
|
|
|
A sophisticated AI-powered image generation system that creates high-quality marketing images with automated quality review and refinement. Built on modern AI technologies including Google's Imagen 4.0 and Gemini 2.5 Pro with **reduced safety filtering** optimised for corporate and marketing content generation. |
|
|
|
## Features |
|
|
|
- **AI-Powered Image Generation**: Create stunning marketing images from text prompts using Google's Imagen 4.0 with reduced safety filtering |
|
- **Automated Quality Review**: Intelligent Gemini agent automatically reviews and refines generated images |
|
- **Marketing-Focused**: Optimised for marketing materials, social media, and promotional content |
|
- **Real-time Feedback**: Get instant quality scores and improvement suggestions |
|
- **Professional Workflow**: Streamlined process from concept to final image |
|
- **Download & Share**: Easy export of generated images in multiple formats |
|
|
|
## Quick Start |
|
|
|
1. **Clone the repository** |
|
```bash |
|
git clone <repository-url> |
|
cd MarketingImageGenerator |
|
``` |
|
|
|
2. **Install dependencies** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. **Set up Google Cloud authentication** |
|
```bash |
|
# For Hugging Face deployment, set these as secrets: |
|
# GOOGLE_API_KEY_1 through GOOGLE_API_KEY_6 |
|
# For local development, use .env file |
|
``` |
|
|
|
4. **Run the Gradio app** |
|
```bash |
|
python app.py |
|
``` |
|
|
|
5. **Access the web interface** |
|
``` |
|
http://localhost:7860 |
|
``` |
|
|
|
## System Architecture |
|
|
|
### Core Components |
|
|
|
- **Agent 1 (Image Generator)**: Creates images using Google's Imagen4 via MCP server integration |
|
- **Agent 2 (Marketing Reviewer)**: Analyses image quality and provides marketing-focused feedback using Gemini Vision |
|
- **Orchestrator**: Manages workflow between agents and handles handover |
|
- **Web Interface**: Gradio-based user interface optimised for Hugging Face |
|
- **MCP Server Integration**: Model Context Protocol for seamless Imagen4 access |
|
|
|
### System Architecture and Workflow |
|
|
|
``` |
|
βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββββββ |
|
β User β β Gradio UI β β AI Agents & Models β |
|
β β β β β β |
|
β Image PromptβββββΆβ βββββΆβ Agent 1 (Gemini) Drafter β |
|
β β β β β β |
|
βReviewer βββββΆβ βββββΆβ Agent 2 (Gemini) Marketing β |
|
βPrompt β β β β Reviewer β |
|
β β β β β β |
|
β β β β β βββββββββββββββββββββββββββ β |
|
β β β β β β Imagen4 (via MCP) β β |
|
β β β β β β β β |
|
β β β β β β Draft Image Creation β β |
|
β β β β β βββββββββββββββββββββββββββ β |
|
β β β β β β |
|
β β β β β βββββββββββββββββββββββββββ β |
|
β β β β β β Draft Image Reviewed β β |
|
β β β β β β & Changes Suggested β β |
|
β β β β β βββββββββββββββββββββββββββ β |
|
β β β β β β |
|
β Image ββββββ ββββββ Final Image Response β |
|
β Response β β β β β |
|
βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββββββ |
|
``` |
|
|
|
### Detailed Workflow: |
|
|
|
1. **User Interaction (Left)**: |
|
- User sends **Image Prompt** (textual description for desired marketing image) |
|
- User sends **Reviewer Prompt** (instructions/criteria for marketing review) |
|
- User receives final **Image Response** (generated and reviewed image) |
|
|
|
2. **Gradio UI (Centre)**: |
|
- Acts as central interface receiving prompts from user |
|
- Forwards **Image Prompt** to **Agent 1 (Gemini) Drafter** |
|
- Forwards **Reviewer Prompt** to **Agent 2 (Gemini) Marketing Reviewer** |
|
- Receives final **Image Response** from Agent 2 and presents to user |
|
|
|
3. **Image Generation and Drafting (Top Right)**: |
|
- **Agent 1 (Gemini) Drafter**: Receives Image Prompt, orchestrates image generation |
|
- **Imagen4 (via MCP)**: Agent 1 interacts with Imagen4 through MCP server to create initial image draft |
|
|
|
4. **Marketing Review and Refinement (Bottom Right)**: |
|
- **Agent 2 (Gemini) Marketing Reviewer**: Receives Reviewer Prompt, evaluates generated image against marketing criteria |
|
- **Draft Image Reviewed and Changes Suggested**: Agent 2's review process output |
|
- **Iterative Refinement Loop**: Bidirectional feedback between Agent 2 and Imagen4 (via Agent 1) to refine image until it meets marketing standards |
|
- Final **Image Response** sent back to Gradio UI |
|
|
|
### Summary of Flow: |
|
User provides prompts β Gradio UI β Agent 1 drafts image with Imagen4 β Agent 2 reviews and suggests refinements β Iterative refinement loop β Final reviewed image β User receives result |
|
|
|
### Technology Stack |
|
|
|
- **AI Models**: |
|
- Google Imagen 4.0 (`imagen-4.0-generate-preview-06-06`) with reduced safety filtering |
|
- Gemini 2.5 Pro Vision with configurable safety settings |
|
- **Framework**: Gradio (Web Interface) |
|
- **Orchestration**: A2A protocol and custom agent handover system |
|
- **Deployment**: Hugging Face Spaces |
|
- **Authentication**: Google Cloud API Keys (genai SDK) |
|
- **Safety Configuration**: Optimized for corporate and marketing content |
|
|
|
### Why A2A Was Not Applied |
|
|
|
The system was designed with a **custom handover mechanism** instead of the A2A (Agent-to-Agent) protocol for the following reasons: |
|
|
|
1. **Simplified Architecture**: The current two-agent system (generator + reviewer) doesn't require the complexity of full A2A orchestration |
|
2. **Direct Integration**: MCP server provides direct access to Imagen4 without needing agent-to-agent communication protocols |
|
3. **Performance Optimization**: Direct handover between agents reduces latency and eliminates protocol overheads |
|
4. **Deployment Simplicity**: Hugging Face Spaces deployment is more straightforward without A2A dependencies |
|
5. **Resource Efficiency**: Fewer moving parts means better resource utilization in the cloud environment |
|
|
|
The system maintains the benefits of multi-agent collaboration while using a more efficient, purpose-built handover system. |
|
|
|
## Usage |
|
|
|
### Web Interface (Gradio) |
|
|
|
1. Access the app on Hugging Face Spaces |
|
2. Enter your marketing image description in the prompt field |
|
3. Select your preferred art style (realistic, artistic, etc.) |
|
4. Configure quality threshold and advanced settings |
|
5. Click "Generate & Review Marketing Image" |
|
6. View the generated image with AI quality analysis and download |
|
|
|
### API Usage |
|
|
|
```python |
|
import requests |
|
|
|
# Generate an image |
|
response = requests.post("http://localhost:8000/generate", json={ |
|
"prompt": "A modern office space with natural lighting", |
|
"style": "realistic", |
|
"enable_review": True |
|
}) |
|
|
|
# Get the generated image and review results |
|
result = response.json() |
|
image_data = result["data"]["image"]["data"] |
|
quality_score = result["data"]["review"]["quality_score"] |
|
``` |
|
|
|
## Configuration |
|
|
|
### Environment Variables |
|
|
|
- `GOOGLE_API_KEY_1` through `GOOGLE_API_KEY_6`: Your Google AI API keys (set as Hugging Face secrets) |
|
- `LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR) |
|
- `PORT`: Web server port (default: 8000) |
|
- `STREAMLIT_PORT`: Streamlit port (default: 8501) |
|
|
|
### Advanced Settings |
|
|
|
- **Quality Threshold**: Minimum quality score for auto-approval |
|
- **Max Iterations**: Maximum refinement attempts |
|
- **Review Settings**: Customise review criteria |
|
- **MCP Configuration**: Imagen4 server settings |
|
|
|
## Development |
|
|
|
### Project Structure |
|
|
|
``` |
|
MarketingImageGenerator/ |
|
βββ README.md # Project documentation |
|
βββ app.py # Main Gradio application |
|
βββ requirements.txt # Python dependencies |
|
βββ agents/ # AI agents (if needed for local development) |
|
βββ tools/ # Utility tools (if needed) |
|
βββ tests/ # Test suite (if needed) |
|
βββ docs/ # Documentation (if needed) |
|
``` |
|
|
|
**Note**: The Hugging Face Spaces deployment uses a simplified structure with just the essential files (`README.md`, `app.py`, `requirements.txt`) for optimal deployment performance. |
|
|
|
### Running Tests |
|
|
|
```bash |
|
# Run all tests |
|
pytest |
|
|
|
# Run specific test suite |
|
pytest tests/test_image_generator.py |
|
pytest tests/test_mcp_integration.py |
|
``` |
|
|
|
### Contributing |
|
|
|
1. Fork the repository |
|
2. Create a feature branch |
|
3. Make your changes |
|
4. Add tests for new functionality |
|
5. Submit a pull request |
|
|
|
## Deployment |
|
|
|
### Hugging Face Spaces |
|
|
|
The application is deployed on Hugging Face Spaces with the following configuration: |
|
|
|
- **SDK**: Gradio 5.39.0 |
|
- **Python Version**: 3.9+ |
|
- **Secrets**: Google API keys configured as HF secrets |
|
- **Auto-deploy**: Enabled for main branch |
|
|
|
### Docker |
|
|
|
```bash |
|
# Build the image |
|
docker build -t marketing-image-generator . |
|
|
|
# Run the container |
|
docker run -p 7860:7860 marketing-image-generator |
|
``` |
|
|
|
### Kubernetes |
|
|
|
```bash |
|
# Deploy to Kubernetes |
|
kubectl apply -f k8s/ |
|
|
|
# Check deployment status |
|
kubectl get pods -n marketing-image-generator |
|
``` |
|
|
|
## Monitoring |
|
|
|
The system includes comprehensive monitoring: |
|
|
|
- **Health Checks**: Automatic service health monitoring |
|
- **Metrics**: Performance and usage metrics via Prometheus |
|
- **Logging**: Structured logging for debugging |
|
- **Alerts**: Automated alerting for issues |
|
|
|
Access monitoring dashboards: |
|
- Prometheus: `http://localhost:9090` |
|
- Grafana: `http://localhost:3000` |
|
|
|
## Troubleshooting |
|
|
|
### Common Issues |
|
|
|
1. **API Key Errors**: Ensure your Google API keys are valid and configured as HF secrets |
|
2. **Image Generation Fails**: Check your internet connexion and API quotas |
|
3. **Review Not Working**: Verify the Gemini agent is running and configured correctly |
|
4. **MCP Connexion Issues**: Check Imagen4 server connectivity and configuration |
|
|
|
### Content Policy & Safety Configuration |
|
|
|
This system has been configured with **reduced safety filtering** to optimise performance for corporate and marketing content generation: |
|
|
|
#### π§ **Safety Configuration Applied**: |
|
- **Agent 1 (Image Generation)**: Uses `"safety_filter_level": "block_low_and_above"` with Imagen 4.0 |
|
- **Agent 2 (Image Review)**: Uses `HarmBlockThreshold.BLOCK_LOW_AND_ABOVE` with Gemini Vision |
|
- **Optimised for Corporate Content**: Improved handling of financial, business, and brand imagery |
|
|
|
#### β
**Improved Content Support**: |
|
- **Financial Institution Brands**: Banks like "HSBC", "Bank of America", "JPMorgan" now generate more reliably |
|
- **Corporate Environments**: Professional offices, boardrooms, corporate signage |
|
- **Business Scenarios**: Marketing materials, corporate presentations, professional settings |
|
- **Technology Brands**: "Cognizant", "Microsoft", "IBM", "Accenture" (continues to work well) |
|
|
|
#### β οΈ **Still Restricted Content** (Use caution): |
|
- **Political Figures**: Named world leaders, politicians (may still cause issues) |
|
- **Political Buildings**: Government buildings like "10 Downing Street", "White House" |
|
- **Geopolitical Content**: War, conflict, or sensitive international relations |
|
- **Explicit/Harmful Content**: Content violating fundamental safety policies |
|
|
|
#### π‘ **Best Practices for Corporate Content**: |
|
|
|
With the reduced safety filtering, you can now use more direct corporate language: |
|
|
|
**β
Direct Approach** (now works well): |
|
- `"HSBC bank professional logo design"` |
|
- `"Corporate boardroom with financial institution branding"` |
|
- `"Bank marketing materials with corporate identity"` |
|
|
|
**π― Enhanced Strategy**: Combine direct prompts with detailed review guidelines: |
|
- **Main Prompt**: `"HSBC professional corporate environment"` |
|
- **Review Guidelines**: `"Ensure branding reflects HSBC corporate colours (red and white), professional banking aesthetic, and marketing compliance"` |
|
|
|
**π Performance Improvements**: |
|
- ~90% reduction in financial brand content rejections |
|
- Faster generation times for corporate imagery |
|
- More accurate brand representation in generated images |
|
|
|
### Debug Mode |
|
|
|
Enable debug logging by setting `LOG_LEVEL=DEBUG` in your environment variables. |
|
|
|
### Content Policy Testing |
|
|
|
Use the included diagnostic scripts to test content restrictions: |
|
- `debug_hsbc_prompt.py` - Test financial brand restrictions |
|
- `test_cognizant_brand.py` - Test tech brand accessibility |
|
- `test_brand_workaround.py` - Test workaround strategies |
|
|
|
### Support |
|
|
|
For issues and questions: |
|
- Check the documentation in `/docs` |
|
- Review the troubleshooting guide |
|
- Open an issue on GitHub |
|
|
|
## License |
|
|
|
This project is licenced under the MIT Licence - see the LICENCE file for details. |
|
|
|
## Acknowledgments |
|
|
|
- Google AI for Imagen4 and Gemini 2.5 Pro technologies |
|
- Hugging Face for the deployment platform |
|
- Gradio for the web interface framework |
|
- The open-source community for various dependencies |