---
title: Inference Providers MCP Server
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---

# 🤖 Inference Providers MCP Server

A streamlined **Model Context Protocol (MCP) Server** that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool.

## ✨ What is this?

This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more.

**Why use this?** Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface.

## 🚀 Supported Providers

| Provider | Chat | Vision | Provider | Chat | Vision |
|----------|------|--------|----------|------|--------|
| Cerebras | ✅ | ❌ | Nebius | ✅ | ✅ |
| Cohere | ✅ | ✅ | Novita | ✅ | ✅ |
| Fal AI | ✅ | ✅ | Nscale | ✅ | ✅ |
| Featherless AI | ✅ | ✅ | Replicate | ✅ | ✅ |
| Fireworks | ✅ | ✅ | SambaNova | ✅ | ✅ |
| Groq | ✅ | ❌ | Together | ✅ | ✅ |
| HF Inference | ✅ | ✅ | Hyperbolic | ✅ | ✅ |

## 🛠️ Quick Setup

### 1. Get HF Token
1. Visit [HF Settings](https://huggingface.co/settings/tokens)
2. Create token with **Inference Providers** scope
3. Copy the token (starts with `hf_`)

### 2. Configure Your MCP Client

#### Cursor IDE
Add to `.cursor/mcp.json`:
```json
{
  "mcpServers": {
    "inference-providers": {
      "url": "YOUR_URL/gradio_api/mcp/sse"
    }
  }
}
```

#### Claude Desktop
Add to MCP settings:
```json
{
  "mcpServers": {
    "inference-providers": {
      "command": "npx",
      "args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"]
    }
  }
}
```


### 3. Server URLs

**HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse`

**Local:** `http://localhost:7860/gradio_api/mcp/sse`

## 🎯 How to Use

Once configured, your LLM can use the tool:

> "Use chat completion with Groq and Llama to explain Python best practices"

> "Chat with DeepSeek V3 via Novita about machine learning concepts"

## 🛠️ Available Tool

**`chat_completion`** - Generate responses using multiple AI providers

**Parameters:**
- `provider`: Provider name (novita, groq, cerebras, etc.)
- `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`)
- `messages`: Input text or JSON messages array
- `temperature`: Response randomness (0.0-2.0, default: 0.7)
- `max_tokens`: Max response length (1-4096, default: 512)

**Environment:** Requires `HF_TOKEN` environment variable

## 🎯 Popular Models

**Text Models:**
- `deepseek-ai/DeepSeek-V3-0324` (Novita)
- `meta-llama/Llama-3.1-70B-Instruct` (Groq)
- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together)

**Vision Models:**
- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together)
- `microsoft/Phi-3.5-vision-instruct` (HF Inference)

## 💻 Local Development

```bash
# Clone and setup
git clone <repository-url>
cd inference-providers-mcp
pip install -r requirements.txt

# Set token and run
export HF_TOKEN=hf_your_token_here
python app.py
```

## 🔧 Technical Details

- **Built with:** Gradio + MCP support (`gradio[mcp]`)
- **Protocol:** Model Context Protocol (MCP) via Server-Sent Events
- **Security:** Environment-based token management
- **Compatibility:** Works with Cursor, Claude Desktop, and other MCP clients

## 🔗 Resources

- [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol)
- [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp)
- [Inference Providers Docs](https://huggingface.co/docs/inference-providers)
- [Get HF Token](https://huggingface.co/settings/tokens)

## 📝 License

MIT License - see the code for details.