|
--- |
|
title: Inference Providers MCP Server |
|
emoji: π€ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: 5.34.2 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# π€ Inference Providers MCP Server |
|
|
|
A streamlined **Model Context Protocol (MCP) Server** that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool. |
|
|
|
## β¨ What is this? |
|
|
|
This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more. |
|
|
|
**Why use this?** Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface. |
|
|
|
## π Supported Providers |
|
|
|
| Provider | Chat | Vision | Provider | Chat | Vision | |
|
|----------|------|--------|----------|------|--------| |
|
| Cerebras | β
| β | Nebius | β
| β
| |
|
| Cohere | β
| β
| Novita | β
| β
| |
|
| Fal AI | β
| β
| Nscale | β
| β
| |
|
| Featherless AI | β
| β
| Replicate | β
| β
| |
|
| Fireworks | β
| β
| SambaNova | β
| β
| |
|
| Groq | β
| β | Together | β
| β
| |
|
| HF Inference | β
| β
| Hyperbolic | β
| β
| |
|
|
|
## π οΈ Quick Setup |
|
|
|
### 1. Get HF Token |
|
1. Visit [HF Settings](https://huggingface.co/settings/tokens) |
|
2. Create token with **Inference Providers** scope |
|
3. Copy the token (starts with `hf_`) |
|
|
|
### 2. Configure Your MCP Client |
|
|
|
#### Cursor IDE |
|
Add to `.cursor/mcp.json`: |
|
```json |
|
{ |
|
"mcpServers": { |
|
"inference-providers": { |
|
"url": "YOUR_URL/gradio_api/mcp/sse" |
|
} |
|
} |
|
} |
|
``` |
|
|
|
#### Claude Desktop |
|
Add to MCP settings: |
|
```json |
|
{ |
|
"mcpServers": { |
|
"inference-providers": { |
|
"command": "npx", |
|
"args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"] |
|
} |
|
} |
|
} |
|
``` |
|
|
|
|
|
### 3. Server URLs |
|
|
|
**HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse` |
|
|
|
**Local:** `http://localhost:7860/gradio_api/mcp/sse` |
|
|
|
## π― How to Use |
|
|
|
Once configured, your LLM can use the tool: |
|
|
|
> "Use chat completion with Groq and Llama to explain Python best practices" |
|
|
|
> "Chat with DeepSeek V3 via Novita about machine learning concepts" |
|
|
|
## π οΈ Available Tool |
|
|
|
**`chat_completion`** - Generate responses using multiple AI providers |
|
|
|
**Parameters:** |
|
- `provider`: Provider name (novita, groq, cerebras, etc.) |
|
- `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`) |
|
- `messages`: Input text or JSON messages array |
|
- `temperature`: Response randomness (0.0-2.0, default: 0.7) |
|
- `max_tokens`: Max response length (1-4096, default: 512) |
|
|
|
**Environment:** Requires `HF_TOKEN` environment variable |
|
|
|
## π― Popular Models |
|
|
|
**Text Models:** |
|
- `deepseek-ai/DeepSeek-V3-0324` (Novita) |
|
- `meta-llama/Llama-3.1-70B-Instruct` (Groq) |
|
- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together) |
|
|
|
**Vision Models:** |
|
- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together) |
|
- `microsoft/Phi-3.5-vision-instruct` (HF Inference) |
|
|
|
## π» Local Development |
|
|
|
```bash |
|
# Clone and setup |
|
git clone <repository-url> |
|
cd inference-providers-mcp |
|
pip install -r requirements.txt |
|
|
|
# Set token and run |
|
export HF_TOKEN=hf_your_token_here |
|
python app.py |
|
``` |
|
|
|
## π§ Technical Details |
|
|
|
- **Built with:** Gradio + MCP support (`gradio[mcp]`) |
|
- **Protocol:** Model Context Protocol (MCP) via Server-Sent Events |
|
- **Security:** Environment-based token management |
|
- **Compatibility:** Works with Cursor, Claude Desktop, and other MCP clients |
|
|
|
## π Resources |
|
|
|
- [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol) |
|
- [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp) |
|
- [Inference Providers Docs](https://huggingface.co/docs/inference-providers) |
|
- [Get HF Token](https://huggingface.co/settings/tokens) |
|
|
|
## π License |
|
|
|
MIT License - see the code for details. |
|
|