--- title: Inference Providers MCP Server emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.34.2 app_file: app.py pinned: false --- # 🤖 Inference Providers MCP Server A streamlined **Model Context Protocol (MCP) Server** that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool. ## ✨ What is this? This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more. **Why use this?** Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface. ## 🚀 Supported Providers | Provider | Chat | Vision | Provider | Chat | Vision | |----------|------|--------|----------|------|--------| | Cerebras | ✅ | ❌ | Nebius | ✅ | ✅ | | Cohere | ✅ | ✅ | Novita | ✅ | ✅ | | Fal AI | ✅ | ✅ | Nscale | ✅ | ✅ | | Featherless AI | ✅ | ✅ | Replicate | ✅ | ✅ | | Fireworks | ✅ | ✅ | SambaNova | ✅ | ✅ | | Groq | ✅ | ❌ | Together | ✅ | ✅ | | HF Inference | ✅ | ✅ | Hyperbolic | ✅ | ✅ | ## 🛠️ Quick Setup ### 1. Get HF Token 1. Visit [HF Settings](https://huggingface.co/settings/tokens) 2. Create token with **Inference Providers** scope 3. Copy the token (starts with `hf_`) ### 2. Configure Your MCP Client #### Cursor IDE Add to `.cursor/mcp.json`: ```json { "mcpServers": { "inference-providers": { "url": "YOUR_URL/gradio_api/mcp/sse" } } } ``` #### Claude Desktop Add to MCP settings: ```json { "mcpServers": { "inference-providers": { "command": "npx", "args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"] } } } ``` ### 3. Server URLs **HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse` **Local:** `http://localhost:7860/gradio_api/mcp/sse` ## 🎯 How to Use Once configured, your LLM can use the tool: > "Use chat completion with Groq and Llama to explain Python best practices" > "Chat with DeepSeek V3 via Novita about machine learning concepts" ## 🛠️ Available Tool **`chat_completion`** - Generate responses using multiple AI providers **Parameters:** - `provider`: Provider name (novita, groq, cerebras, etc.) - `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`) - `messages`: Input text or JSON messages array - `temperature`: Response randomness (0.0-2.0, default: 0.7) - `max_tokens`: Max response length (1-4096, default: 512) **Environment:** Requires `HF_TOKEN` environment variable ## 🎯 Popular Models **Text Models:** - `deepseek-ai/DeepSeek-V3-0324` (Novita) - `meta-llama/Llama-3.1-70B-Instruct` (Groq) - `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together) **Vision Models:** - `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together) - `microsoft/Phi-3.5-vision-instruct` (HF Inference) ## 💻 Local Development ```bash # Clone and setup git clone cd inference-providers-mcp pip install -r requirements.txt # Set token and run export HF_TOKEN=hf_your_token_here python app.py ``` ## 🔧 Technical Details - **Built with:** Gradio + MCP support (`gradio[mcp]`) - **Protocol:** Model Context Protocol (MCP) via Server-Sent Events - **Security:** Environment-based token management - **Compatibility:** Works with Cursor, Claude Desktop, and other MCP clients ## 🔗 Resources - [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol) - [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp) - [Inference Providers Docs](https://huggingface.co/docs/inference-providers) - [Get HF Token](https://huggingface.co/settings/tokens) ## 📝 License MIT License - see the code for details.