burtenshaw
switch back to gradio
551ae1a

A newer version of the Gradio SDK is available: 5.35.0

Upgrade
metadata
title: Inference Providers MCP Server
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false

πŸ€– Inference Providers MCP Server

A streamlined Model Context Protocol (MCP) Server that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool.

✨ What is this?

This MCP server exposes a chat_completion tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more.

Why use this? Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface.

πŸš€ Supported Providers

Provider Chat Vision Provider Chat Vision
Cerebras βœ… ❌ Nebius βœ… βœ…
Cohere βœ… βœ… Novita βœ… βœ…
Fal AI βœ… βœ… Nscale βœ… βœ…
Featherless AI βœ… βœ… Replicate βœ… βœ…
Fireworks βœ… βœ… SambaNova βœ… βœ…
Groq βœ… ❌ Together βœ… βœ…
HF Inference βœ… βœ… Hyperbolic βœ… βœ…

πŸ› οΈ Quick Setup

1. Get HF Token

  1. Visit HF Settings
  2. Create token with Inference Providers scope
  3. Copy the token (starts with hf_)

2. Configure Your MCP Client

Cursor IDE

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "inference-providers": {
      "url": "YOUR_URL/gradio_api/mcp/sse"
    }
  }
}

Claude Desktop

Add to MCP settings:

{
  "mcpServers": {
    "inference-providers": {
      "command": "npx",
      "args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"]
    }
  }
}

3. Server URLs

HF Spaces: https://username-spacename.hf.space/gradio_api/mcp/sse

Local: http://localhost:7860/gradio_api/mcp/sse

🎯 How to Use

Once configured, your LLM can use the tool:

"Use chat completion with Groq and Llama to explain Python best practices"

"Chat with DeepSeek V3 via Novita about machine learning concepts"

πŸ› οΈ Available Tool

chat_completion - Generate responses using multiple AI providers

Parameters:

  • provider: Provider name (novita, groq, cerebras, etc.)
  • model: Model ID (e.g., deepseek-ai/DeepSeek-V3-0324)
  • messages: Input text or JSON messages array
  • temperature: Response randomness (0.0-2.0, default: 0.7)
  • max_tokens: Max response length (1-4096, default: 512)

Environment: Requires HF_TOKEN environment variable

🎯 Popular Models

Text Models:

  • deepseek-ai/DeepSeek-V3-0324 (Novita)
  • meta-llama/Llama-3.1-70B-Instruct (Groq)
  • mistralai/Mixtral-8x7B-Instruct-v0.1 (Together)

Vision Models:

  • meta-llama/Llama-3.2-11B-Vision-Instruct (Together)
  • microsoft/Phi-3.5-vision-instruct (HF Inference)

πŸ’» Local Development

# Clone and setup
git clone <repository-url>
cd inference-providers-mcp
pip install -r requirements.txt

# Set token and run
export HF_TOKEN=hf_your_token_here
python app.py

πŸ”§ Technical Details

  • Built with: Gradio + MCP support (gradio[mcp])
  • Protocol: Model Context Protocol (MCP) via Server-Sent Events
  • Security: Environment-based token management
  • Compatibility: Works with Cursor, Claude Desktop, and other MCP clients

πŸ”— Resources

πŸ“ License

MIT License - see the code for details.