burtenshaw
switch back to gradio
551ae1a
---
title: Inference Providers MCP Server
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---
# πŸ€– Inference Providers MCP Server
A streamlined **Model Context Protocol (MCP) Server** that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool.
## ✨ What is this?
This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more.
**Why use this?** Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface.
## πŸš€ Supported Providers
| Provider | Chat | Vision | Provider | Chat | Vision |
|----------|------|--------|----------|------|--------|
| Cerebras | βœ… | ❌ | Nebius | βœ… | βœ… |
| Cohere | βœ… | βœ… | Novita | βœ… | βœ… |
| Fal AI | βœ… | βœ… | Nscale | βœ… | βœ… |
| Featherless AI | βœ… | βœ… | Replicate | βœ… | βœ… |
| Fireworks | βœ… | βœ… | SambaNova | βœ… | βœ… |
| Groq | βœ… | ❌ | Together | βœ… | βœ… |
| HF Inference | βœ… | βœ… | Hyperbolic | βœ… | βœ… |
## πŸ› οΈ Quick Setup
### 1. Get HF Token
1. Visit [HF Settings](https://huggingface.co/settings/tokens)
2. Create token with **Inference Providers** scope
3. Copy the token (starts with `hf_`)
### 2. Configure Your MCP Client
#### Cursor IDE
Add to `.cursor/mcp.json`:
```json
{
"mcpServers": {
"inference-providers": {
"url": "YOUR_URL/gradio_api/mcp/sse"
}
}
}
```
#### Claude Desktop
Add to MCP settings:
```json
{
"mcpServers": {
"inference-providers": {
"command": "npx",
"args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"]
}
}
}
```
### 3. Server URLs
**HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse`
**Local:** `http://localhost:7860/gradio_api/mcp/sse`
## 🎯 How to Use
Once configured, your LLM can use the tool:
> "Use chat completion with Groq and Llama to explain Python best practices"
> "Chat with DeepSeek V3 via Novita about machine learning concepts"
## πŸ› οΈ Available Tool
**`chat_completion`** - Generate responses using multiple AI providers
**Parameters:**
- `provider`: Provider name (novita, groq, cerebras, etc.)
- `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`)
- `messages`: Input text or JSON messages array
- `temperature`: Response randomness (0.0-2.0, default: 0.7)
- `max_tokens`: Max response length (1-4096, default: 512)
**Environment:** Requires `HF_TOKEN` environment variable
## 🎯 Popular Models
**Text Models:**
- `deepseek-ai/DeepSeek-V3-0324` (Novita)
- `meta-llama/Llama-3.1-70B-Instruct` (Groq)
- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together)
**Vision Models:**
- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together)
- `microsoft/Phi-3.5-vision-instruct` (HF Inference)
## πŸ’» Local Development
```bash
# Clone and setup
git clone <repository-url>
cd inference-providers-mcp
pip install -r requirements.txt
# Set token and run
export HF_TOKEN=hf_your_token_here
python app.py
```
## πŸ”§ Technical Details
- **Built with:** Gradio + MCP support (`gradio[mcp]`)
- **Protocol:** Model Context Protocol (MCP) via Server-Sent Events
- **Security:** Environment-based token management
- **Compatibility:** Works with Cursor, Claude Desktop, and other MCP clients
## πŸ”— Resources
- [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol)
- [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp)
- [Inference Providers Docs](https://huggingface.co/docs/inference-providers)
- [Get HF Token](https://huggingface.co/settings/tokens)
## πŸ“ License
MIT License - see the code for details.