Spaces:

burtenshaw
/

inference-providers-mcp

Running

App Files Files Community

inference-providers-mcp / README.md

burtenshaw

switch back to gradio

551ae1a 17 days ago

preview code

raw

history blame contribute delete

3.81 kB

	---
	title: Inference Providers MCP Server
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.34.2
	app_file: app.py
	pinned: false
	---

	# 🤖 Inference Providers MCP Server

	A streamlined Model Context Protocol (MCP) Server that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool.

	## ✨ What is this?

	This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more.

	Why use this? Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface.

	## 🚀 Supported Providers

	\| Provider \| Chat \| Vision \| Provider \| Chat \| Vision \|
	\|----------\|------\|--------\|----------\|------\|--------\|
	\| Cerebras \| ✅ \| ❌ \| Nebius \| ✅ \| ✅ \|
	\| Cohere \| ✅ \| ✅ \| Novita \| ✅ \| ✅ \|
	\| Fal AI \| ✅ \| ✅ \| Nscale \| ✅ \| ✅ \|
	\| Featherless AI \| ✅ \| ✅ \| Replicate \| ✅ \| ✅ \|
	\| Fireworks \| ✅ \| ✅ \| SambaNova \| ✅ \| ✅ \|
	\| Groq \| ✅ \| ❌ \| Together \| ✅ \| ✅ \|
	\| HF Inference \| ✅ \| ✅ \| Hyperbolic \| ✅ \| ✅ \|

	## 🛠️ Quick Setup

	### 1. Get HF Token
	1. Visit [HF Settings](https://huggingface.co/settings/tokens)
	2. Create token with Inference Providers scope
	3. Copy the token (starts with `hf_`)

	### 2. Configure Your MCP Client

	#### Cursor IDE
	Add to `.cursor/mcp.json`:
	```json
	{
	"mcpServers": {
	"inference-providers": {
	"url": "YOUR_URL/gradio_api/mcp/sse"
	}
	}
	}
	```

	#### Claude Desktop
	Add to MCP settings:
	```json
	{
	"mcpServers": {
	"inference-providers": {
	"command": "npx",
	"args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"]
	}
	}
	}
	```


	### 3. Server URLs

	HF Spaces: `https://username-spacename.hf.space/gradio_api/mcp/sse`

	Local: `http://localhost:7860/gradio_api/mcp/sse`

	## 🎯 How to Use

	Once configured, your LLM can use the tool:

	> "Use chat completion with Groq and Llama to explain Python best practices"

	> "Chat with DeepSeek V3 via Novita about machine learning concepts"

	## 🛠️ Available Tool

	`chat_completion` - Generate responses using multiple AI providers

	Parameters:
	- `provider`: Provider name (novita, groq, cerebras, etc.)
	- `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`)
	- `messages`: Input text or JSON messages array
	- `temperature`: Response randomness (0.0-2.0, default: 0.7)
	- `max_tokens`: Max response length (1-4096, default: 512)

	Environment: Requires `HF_TOKEN` environment variable

	## 🎯 Popular Models

	Text Models:
	- `deepseek-ai/DeepSeek-V3-0324` (Novita)
	- `meta-llama/Llama-3.1-70B-Instruct` (Groq)
	- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together)

	Vision Models:
	- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together)
	- `microsoft/Phi-3.5-vision-instruct` (HF Inference)

	## 💻 Local Development

	```bash
	# Clone and setup
	git clone <repository-url>
	cd inference-providers-mcp
	pip install -r requirements.txt

	# Set token and run
	export HF_TOKEN=hf_your_token_here
	python app.py
	```

	## 🔧 Technical Details

	- Built with: Gradio + MCP support (`gradio[mcp]`)
	- Protocol: Model Context Protocol (MCP) via Server-Sent Events
	- Security: Environment-based token management
	- Compatibility: Works with Cursor, Claude Desktop, and other MCP clients

	## 🔗 Resources

	- [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol)
	- [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp)
	- [Inference Providers Docs](https://huggingface.co/docs/inference-providers)
	- [Get HF Token](https://huggingface.co/settings/tokens)

	## 📝 License

	MIT License - see the code for details.