Spaces:

burtenshaw
/

inference-providers-mcp

Running

App Files Files Community

burtenshaw commited on Jun 23

Commit

551ae1a

1 Parent(s): fb723d2

switch back to gradio

Browse files

Files changed (6) hide show

= +0 -0
README.md +81 -453
mcp_server.py → app.py +198 -182
pyproject.toml +4 -3
requirements.txt +2 -1
uv.lock +0 -0

= ADDED Viewed

File without changes

README.md CHANGED Viewed

@@ -1,508 +1,136 @@
-# ⚡ Inference Providers FastMCP Server
-A **FastMCP Server** for Hugging Face Inference Providers, built with [FastMCP](https://github.com/jlowin/fastmcp) - the fast, Pythonic way to build MCP servers. This allows LLMs and AI assistants to access multiple AI providers and language models through the Model Context Protocol.
-## ✨ Features
-- **⚡ FastMCP**: Built with FastMCP for optimal performance and simplicity
-- **🚀 UV-Powered**: Uses UV/UVX for fast, modern Python dependency management
-- **🤖 MCP Server**: Native MCP server with tools, resources, and prompts
-- **🎯 Multi-Provider Support**: Access 14+ inference providers including Cerebras, Cohere, Fal AI, Fireworks, Groq, and more
-- **💬 Chat Completion**: Interactive conversations with LLMs and Vision Language Models
-- **📊 Resources**: Access provider information and popular model recommendations
-- **🔍 Context Logging**: Rich logging and error handling through MCP context
-- **🔧 Easy Integration**: Simple configuration for Cursor, Claude Desktop, and other MCP clients
-## 🚀 Supported Providers
-| Provider | Chat Completion | Vision Language Models |
-|----------|----------------|------------------------|
-| Cerebras | ✅ | ❌ |
-| Cohere | ✅ | ✅ |
-| Fal AI | ✅ | ✅ |
-| Featherless AI | ✅ | ✅ |
-| Fireworks | ✅ | ✅ |
-| Groq | ✅ | ❌ |
-| HF Inference | ✅ | ✅ |
-| Hyperbolic | ✅ | ✅ |
-| Nebius | ✅ | ✅ |
-| Novita | ✅ | ✅ |
-| Nscale | ✅ | ✅ |
-| Replicate | ✅ | ✅ |
-| SambaNova | ✅ | ✅ |
-| Together | ✅ | ✅ |
-## 🛠️ Quick Start
-### 1. Get a Hugging Face Token
-1. Go to [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
-2. Create a new token with **Inference Providers** scope
-3. Copy the token (starts with `hf_`)
-### 2. Install Dependencies
-```bash
-# Clone the repository
-git clone <repository-url>
-cd inference-providers-mcp
-# Install dependencies
-pip install -r requirements.txt
-```
-### 3. Set Environment Variables
-Create a `.env` file in your project directory:
-```bash
-# .env file
-HF_TOKEN=hf_your_actual_token_here
-```
-Or set it globally:
-```bash
-# Linux/macOS
-export HF_TOKEN=hf_your_actual_token_here
-# Windows
-set HF_TOKEN=hf_your_actual_token_here
-```
-### 4. Test the Server
-```bash
-# Test the server works (using UV - recommended)
-uvx test_mcp.py
-# Or test with Python
-python test_mcp.py
-# Run the server manually (optional)
-uvx mcp_server.py
-# Or: python mcp_server.py
-```
-## 🎯 Cursor IDE Integration
-There are several ways to integrate this FastMCP server with Cursor IDE. Choose the method that works best for your setup.
-> **✅ Your Current Configuration is Already Optimal!**
->
-> Looking at your `.cursor/mcp.json`, you're already using `uvx` which is the recommended approach. Your configuration with `uvx` + `mcp_server.py` is perfect for modern FastMCP development!
-### Method 1: Cursor Settings UI (Recommended)
-This is the easiest method for beginners:
-1. **Open Cursor Settings**:
-   - Go to `Settings → Cursor Settings → Features → Model Context Protocol`
-   - OR use `Cmd/Ctrl + ,` and search for "MCP"
-2. **Add New MCP Server**:
-   - Click **"Add New MCP Server"**
-   - Fill in the configuration:
-```
-Name: inference-providers
-Command: uvx
-Arguments: mcp_server.py
-Environment Variables:
-  HF_TOKEN: hf_your_actual_token_here
-```
-**Why UV/UVX?** ✨
-- **Faster**: UV is significantly faster than pip for dependency management
-- **Auto-manages dependencies**: Automatically handles virtual environments and packages
-- **Modern**: The recommended approach for Python tooling in 2025
-- **No setup required**: Works without manual virtual environment creation
-3. **Save and Test**:
-   - Click **"Add"** to save
-   - Restart Cursor
-   - Open a new chat and try: *"Use the chat completion tool to ask Groq about Python"*
-### Method 2: Project-Specific Configuration (Recommended)
-Create a `.cursor/mcp.json` file in your project root:
-```json
-{
-  "mcpServers": {
-    "inference-providers": {
-      "command": "uvx",
-      "args": ["mcp_server.py"],
-      "env": {
-        "HF_TOKEN": "hf_your_actual_token_here"
-      }
-    }
-  }
-}
-```
-**Advantages**:
-- ✅ Project-specific (only available in this project)
-- ✅ Can be version controlled (but **don't commit tokens!**)
-- ✅ Automatic activation when opening the project
-- ✅ UV automatically handles dependencies from `pyproject.toml`
-### Method 3: Global Configuration
-Create a global configuration file:
-**Linux/macOS**: `~/.cursor/mcp.json`
-**Windows**: `%USERPROFILE%\.cursor\mcp.json`
 ```json
 {
   "mcpServers": {
     "inference-providers": {
-      "command": "uvx",
-      "args": ["/full/path/to/your/project/mcp_server.py"],
-      "env": {
-        "HF_TOKEN": "hf_your_actual_token_here"
-      }
     }
   }
 }
 ```
-**Advantages**:
-- ✅ Available across all Cursor projects
-- ✅ Set once, use everywhere
-### Method 4: Environment Variables (Most Secure)
-If you have `HF_TOKEN` set as a system environment variable, you can use:
 ```json
 {
   "mcpServers": {
     "inference-providers": {
-      "command": "uvx",
-      "args": ["mcp_server.py"]
     }
   }
 }
 ```
-The server will automatically pick up `HF_TOKEN` from your environment.
-## 🔄 UV vs Python: When to Use Which?
-| Approach | Best For | Pros | Cons |
-|----------|----------|------|------|
-| **`uvx` (Recommended)** | Most users, development | ⚡ Fast, auto-manages deps, modern | Requires UV installation |
-| **`python`** | System restrictions, debugging | 🔧 Universal, explicit control | Manual venv management |
-| **`uv run`** | Local development | 🎯 Project-aware, consistent | Must be in project directory |
-### UV Installation
-If you don't have UV installed:
-```bash
-# macOS/Linux
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pip install
-pip install uv
-```
-## 🎮 Using the Server in Cursor
-Once configured, you can use the server in several ways:
-### 1. Let Cursor Auto-Select Tools
-Simply describe what you want:
-```
-"Help me compare language models for code generation"
-"Get recommendations for the best chat models"
-"I need to chat with a model about Python best practices"
-```
-Cursor will automatically detect and use the appropriate tools.
-### 2. Explicitly Request Tools
-Be more specific about which tool to use:
-```
-"Use the chat_completion tool with DeepSeek V3 via Novita to explain machine learning"
-"Call the inference providers chat tool to ask Groq about async programming"
-```
-### 3. Access Resources
-Get information about providers and models:
-```
-"Show me the available inference providers"
-"What are the popular models I can use?"
-"Get the provider capabilities information"
-```
-### 4. Generate Prompts
-Use the prompt generation feature:
-```
-"Generate a prompt to compare chat providers"
-"Create a comparison prompt for vision language models"
-```
-## 🎪 Example Conversations
-### Basic Chat Example
-```
-You: "Use the chat completion tool with Groq and Llama 3.1 70B to explain async/await in Python"
-Cursor: [Calls chat_completion tool]
-- Provider: groq
-- Model: meta-llama/Llama-3.1-70B-Instruct
-- Message: "Explain async/await in Python with examples"
-Response: [Detailed explanation of async/await...]
-```
-### Provider Comparison Example
-```
-You: "Help me choose between Groq and Together AI for coding tasks"
-Cursor: [Uses provider comparison prompt and chat completion]
-Response: [Detailed comparison of providers with recommendations...]
-```
-### Model Recommendations
-```
-You: "What are good models for vision tasks?"
-Cursor: [Accesses models/popular resource]
-Response: Here are the recommended vision models:
-- meta-llama/Llama-3.2-11B-Vision-Instruct (Together)
-- microsoft/Phi-3.5-vision-instruct (HF Inference)
-- command-r-plus-vision (Cohere)
-```
-## 🔧 Advanced Configuration
-### Running as Remote Server
-For team usage or remote development:
 ```bash
-# Option 1: Using UV (Recommended)
-uvx mcp_server.py --transport=sse --host=0.0.0.0 --port=8000
-# Option 2: Using Python directly
-python -c "
-from mcp_server import mcp
-mcp.run(transport='sse', host='0.0.0.0', port=8000)
-"
-```
-Then configure Cursor to connect remotely:
-```json
-{
-  "mcpServers": {
-    "inference-providers": {
-      "command": "npx",
-      "args": [
-        "-y",
-        "@modelcontextprotocol/client-remote",
-        "http://your-server:8000/sse"
-      ],
-      "env": {
-        "HF_TOKEN": "hf_your_token_here"
-      }
-    }
-  }
-}
-```
-### Alternative UV Commands
-Different ways to run with UV:
-```json
-{
-  "mcpServers": {
-    "inference-providers-uvx": {
-      "command": "uvx",
-      "args": ["mcp_server.py"],
-      "env": {"HF_TOKEN": "hf_your_token_here"}
-    },
-    "inference-providers-uv-run": {
-      "command": "uv",
-      "args": ["run", "mcp_server.py"],
-      "env": {"HF_TOKEN": "hf_your_token_here"}
-    },
-    "inference-providers-uv-tool": {
-      "command": "uv",
-      "args": ["tool", "run", "mcp_server.py"],
-      "env": {"HF_TOKEN": "hf_your_token_here"}
-    }
-  }
-}
 ```
-**Differences:**
-- **`uvx`**: Installs and runs in isolated environment (recommended)
-- **`uv run`**: Runs using project's pyproject.toml (project-aware)
-- **`uv tool run`**: Explicit tool execution (most explicit)
-## 🚨 Troubleshooting
-### Server Not Appearing in Cursor
-1. **Check Configuration Syntax**:
-   ```bash
-   # Validate JSON syntax
-   python -c "import json; print(json.load(open('.cursor/mcp.json')))"
-   ```
-2. **Verify Command Works**:
-   ```bash
-   # Test with UV (recommended)
-   uvx mcp_server.py
-   # Or test with Python
-   python mcp_server.py
-   ```
-3. **Check UV Installation**:
-   ```bash
-   # Verify UV is installed
-   uv --version
-   uvx --version
-   ```
-4. **Check Token Format**:
-   - Token should start with `hf_`
-   - No quotes in environment variables
-   - Token has "Inference Providers" scope
-### Tool Not Working
-1. **Check Cursor Logs**:
-   - Go to `Help → Show Logs`
-   - Look for MCP-related errors
-2. **Test Server Manually**:
-   ```bash
-   # Test with UV
-   uvx test_mcp.py
-   # Or with Python
-   python test_mcp.py
-   ```
-3. **Verify Dependencies**:
-   ```bash
-   # UV automatically handles dependencies, but you can check:
-   uv pip list
-   ```
-4. **Verify Token Permissions**:
-   - Go to [HF Settings](https://huggingface.co/settings/tokens)
-   - Ensure token has "Inference Providers" access
-### Common Error Messages
-| Error | Solution |
-|-------|----------|
-| `HF_TOKEN is required` | Set HF_TOKEN environment variable |
-| `Unknown provider: xyz` | Check provider name spelling |
-| `Import "fastmcp" could not be resolved` | Run `uv add fastmcp` or `pip install fastmcp` |
-| `Server failed to start` | Check UV/Python path and permissions |
-| `uvx: command not found` | Install UV: `curl -LsSf https://astral.sh/uv/install.sh \| sh` |
-| `Permission denied` | Check file permissions: `chmod +x mcp_server.py` |
-### Getting Help
-If you're still having issues:
-1. **Check our test script**: `python test_mcp.py`
-2. **Review Cursor MCP docs**: [https://docs.cursor.com/context/model-context-protocol](https://docs.cursor.com/context/model-context-protocol)
-3. **Check FastMCP docs**: [https://github.com/jlowin/fastmcp](https://github.com/jlowin/fastmcp)
-4. **Cursor Community**: [https://forum.cursor.com](https://forum.cursor.com)
-## 🤖 Available MCP Capabilities
-### 🛠️ Tools
-**`chat_completion`** - Generate chat completions using Hugging Face Inference Providers
-Parameters:
-- `provider`: Inference provider (cerebras, cohere, groq, novita, etc.)
-- `model`: Model ID from Hugging Face Hub
-- `messages`: Chat messages (JSON array or plain text)
-- `temperature`: Response randomness (0.0-2.0, default 0.7)
-- `max_tokens`: Maximum response length (1-4096, default 512)
-- `top_p`: Nucleus sampling (0.0-1.0, default 0.9)
-- `stream`: Stream response (boolean, default False)
-- `stop_sequences`: Stop sequences (comma-separated)
-- `frequency_penalty`: Frequency penalty (-2.0 to 2.0)
-- `presence_penalty`: Presence penalty (-2.0 to 2.0)
-- `hf_token`: Your Hugging Face token (optional, uses env var)
-### 📊 Resources
-**`providers`** - Get list of available inference providers and capabilities
-**`models/popular`** - Get curated recommendations for popular models
-### 💭 Prompts
-**`generate_provider_comparison_prompt`** - Generate prompts for comparing providers
-## 🚀 FastMCP Features Used
-- **@mcp.tool**: Exposes the chat completion function as an MCP tool
-- **@mcp.resource**: Provides access to provider and model information
-- **@mcp.prompt**: Generates helpful prompts for provider comparison
-- **Context**: Rich logging, error handling, and progress reporting
-- **Multiple Transports**: Supports stdio, SSE, and HTTP transports
-## 🎯 Popular Models to Try
-**Chat Models:**
-- `deepseek-ai/DeepSeek-V3-0324` (Novita)
-- `meta-llama/Llama-3.1-70B-Instruct` (Groq)
-- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together)
-- `google/gemma-2-27b-it` (HF Inference)
-**Vision Language Models:**
-- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together)
-- `microsoft/Phi-3.5-vision-instruct` (HF Inference)
-## 📖 Technical Details
-This MCP server is built using:
-- **FastMCP v2+** - The fast, Pythonic way to build MCP servers
-- **Model Context Protocol (MCP)** - For standardized tool exposure
-- **Hugging Face Inference Providers** - For model access across providers
-- **Async/Await** - For efficient request handling
-- **Rich Context Logging** - For detailed operation tracking
-## 🔗 Links
-- [FastMCP GitHub](https://github.com/jlowin/fastmcp)
-- [FastMCP Documentation](https://gofastmcp.com)
 - [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol)
-- [Model Context Protocol](https://modelcontextprotocol.io/)
-- [Inference Providers Documentation](https://huggingface.co/docs/inference-providers)
 - [Get HF Token](https://huggingface.co/settings/tokens)
-- [Cursor Community Forum](https://forum.cursor.com)
 ## 📝 License
-This project is open source and available under the MIT License.

+---
+title: Inference Providers MCP Server
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.34.2
+app_file: app.py
+pinned: false
+---
+# 🤖 Inference Providers MCP Server
+A streamlined **Model Context Protocol (MCP) Server** that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool.
+## ✨ What is this?
+This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more.
+**Why use this?** Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface.
+## 🚀 Supported Providers
+| Provider | Chat | Vision | Provider | Chat | Vision |
+|----------|------|--------|----------|------|--------|
+| Cerebras | ✅ | ❌ | Nebius | ✅ | ✅ |
+| Cohere | ✅ | ✅ | Novita | ✅ | ✅ |
+| Fal AI | ✅ | ✅ | Nscale | ✅ | ✅ |
+| Featherless AI | ✅ | ✅ | Replicate | ✅ | ✅ |
+| Fireworks | ✅ | ✅ | SambaNova | ✅ | ✅ |
+| Groq | ✅ | ❌ | Together | ✅ | ✅ |
+| HF Inference | ✅ | ✅ | Hyperbolic | ✅ | ✅ |
+## 🛠️ Quick Setup
+### 1. Get HF Token
+1. Visit [HF Settings](https://huggingface.co/settings/tokens)
+2. Create token with **Inference Providers** scope
+3. Copy the token (starts with `hf_`)
+### 2. Configure Your MCP Client
+#### Cursor IDE
+Add to `.cursor/mcp.json`:
 ```json
 {
   "mcpServers": {
     "inference-providers": {
+      "url": "YOUR_URL/gradio_api/mcp/sse"
     }
   }
 }
 ```
+#### Claude Desktop
+Add to MCP settings:
 ```json
 {
   "mcpServers": {
     "inference-providers": {
+      "command": "npx",
+      "args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"]
     }
   }
 }
 ```
+### 3. Server URLs
+**HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse`
+**Local:** `http://localhost:7860/gradio_api/mcp/sse`
+## 🎯 How to Use
+Once configured, your LLM can use the tool:
+> "Use chat completion with Groq and Llama to explain Python best practices"
+> "Chat with DeepSeek V3 via Novita about machine learning concepts"
+## 🛠️ Available Tool
+**`chat_completion`** - Generate responses using multiple AI providers
+**Parameters:**
+- `provider`: Provider name (novita, groq, cerebras, etc.)
+- `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`)
+- `messages`: Input text or JSON messages array
+- `temperature`: Response randomness (0.0-2.0, default: 0.7)
+- `max_tokens`: Max response length (1-4096, default: 512)
+**Environment:** Requires `HF_TOKEN` environment variable
+## 🎯 Popular Models
+**Text Models:**
+- `deepseek-ai/DeepSeek-V3-0324` (Novita)
+- `meta-llama/Llama-3.1-70B-Instruct` (Groq)
+- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together)
+**Vision Models:**
+- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together)
+- `microsoft/Phi-3.5-vision-instruct` (HF Inference)
+## 💻 Local Development
 ```bash
+# Clone and setup
+git clone <repository-url>
+cd inference-providers-mcp
+pip install -r requirements.txt
+# Set token and run
+export HF_TOKEN=hf_your_token_here
+python app.py
 ```
+## 🔧 Technical Details
+- **Built with:** Gradio + MCP support (`gradio[mcp]`)
+- **Protocol:** Model Context Protocol (MCP) via Server-Sent Events
+- **Security:** Environment-based token management
+- **Compatibility:** Works with Cursor, Claude Desktop, and other MCP clients
+## 🔗 Resources
 - [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol)
+- [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp)
+- [Inference Providers Docs](https://huggingface.co/docs/inference-providers)
 - [Get HF Token](https://huggingface.co/settings/tokens)
 ## 📝 License
+MIT License - see the code for details.

mcp_server.py → app.py RENAMED Viewed

@@ -1,11 +1,8 @@
 import os
-import json
 import requests
-from typing import Dict, Any, Optional
-from fastmcp import FastMCP, Context
-# Initialize FastMCP server
-mcp = FastMCP("Inference Providers MCP Server")
 # Inference Providers configuration
 PROVIDERS = {
@@ -82,228 +79,247 @@ PROVIDERS = {
 }
-async def make_request(
-    provider: str,
-    endpoint: str,
-    payload: Dict[str, Any],
-    hf_token: str,
-    ctx: Optional[Context] = None,
-) -> Dict[str, Any]:
-    """Make a request to the inference provider"""
-    if not hf_token:
-        error_msg = (
-            "HF_TOKEN is required. Please set it in the environment or provide it."
-        )
-        if ctx:
-            await ctx.error(error_msg)
-        return {"error": error_msg}
-    provider_config = PROVIDERS.get(provider)
-    if not provider_config:
-        error_msg = f"Unknown provider: {provider}"
-        if ctx:
-            await ctx.error(error_msg)
-        return {"error": error_msg}
-    url = f"{provider_config['base_url']}/{endpoint}"
-    headers = {
-        "Authorization": f"Bearer {hf_token}",
-        "Content-Type": "application/json",
-    }
-    if ctx:
-        await ctx.info(f"Making request to {provider} ({url})")
-    try:
-        response = requests.post(url, headers=headers, json=payload, timeout=60)
-        response.raise_for_status()
-        if ctx:
-            await ctx.info(f"Request successful to {provider}")
-        return response.json()
-    except requests.exceptions.RequestException as e:
-        error_msg = f"Request failed: {str(e)}"
-        if ctx:
-            await ctx.error(error_msg)
-        return {"error": error_msg}
-@mcp.tool()
-async def chat_completion(
     provider: str,
     model: str,
     messages: str,
-    ctx: Context,
     temperature: float = 0.7,
     max_tokens: int = 512,
-    top_p: float = 0.9,
-    stream: bool = False,
-    stop_sequences: str = "",
-    frequency_penalty: float = 0.0,
-    presence_penalty: float = 0.0,
-    hf_token: Optional[str] = None,
-) -> str:
     """Generate chat completions using Hugging Face Inference Providers.
-    This tool allows you to chat with various language models through
-    different inference providers including Cerebras, Cohere, Fireworks,
-    Groq, and others.
     Args:
-        provider: The inference provider to use (cerebras, cohere, fal-ai,
-                 featherless-ai, fireworks-ai, groq, hf-inference,
-                 hyperbolic, nebius, novita, nscale, replicate, sambanova,
-                 together)
         model: The model ID from Hugging Face Hub
                (e.g., 'deepseek-ai/DeepSeek-V3-0324')
         messages: Either a JSON array of messages in OpenAI format or
                  plain text for simple queries
         temperature: Controls response randomness (0.0-2.0, default 0.7)
         max_tokens: Maximum tokens in response (1-4096, default 512)
-        top_p: Nucleus sampling parameter (0.0-1.0, default 0.9)
-        stream: Whether to stream the response (default False)
-        stop_sequences: Comma-separated stop sequences (optional)
-        frequency_penalty: Penalize frequent tokens (-2.0 to 2.0)
-        presence_penalty: Penalize present tokens (-2.0 to 2.0)
-        hf_token: Your Hugging Face token with Inference Providers access
-                 (falls back to HF_TOKEN environment variable)
     Returns:
         The generated text response from the language model
     """
-    # Get HF token from parameter or environment
-    token = hf_token or os.getenv("HF_TOKEN")
-    if not token:
-        await ctx.error("HF_TOKEN not provided and not found in environment")
-        return "Error: HF_TOKEN is required but not provided"
-    await ctx.info(f"Starting chat completion with {provider} provider")
-    await ctx.info(f"Model: {model}")
     try:
         # Parse messages
         if messages.strip().startswith("["):
             parsed_messages = json.loads(messages)
-            await ctx.info(f"Parsed {len(parsed_messages)} messages from JSON")
         else:
             parsed_messages = [{"role": "user", "content": messages}]
-            await ctx.info("Created single user message")
         payload = {
             "model": model,
             "messages": parsed_messages,
             "temperature": temperature,
             "max_tokens": max_tokens,
-            "top_p": top_p,
-            "stream": stream,
         }
-        # Add optional parameters
-        if stop_sequences.strip():
-            payload["stop"] = [s.strip() for s in stop_sequences.split(",")]
-            await ctx.info(f"Added stop sequences: {payload['stop']}")
-        if frequency_penalty != 0:
-            payload["frequency_penalty"] = frequency_penalty
-        if presence_penalty != 0:
-            payload["presence_penalty"] = presence_penalty
-        result = await make_request(
-            provider, "v1/chat/completions", payload, token, ctx
-        )
-        if "error" in result:
-            await ctx.error(f"API Error: {result['error']}")
-            return f"Error: {result['error']}"
         if "choices" in result and len(result["choices"]) > 0:
-            response_text = result["choices"][0]["message"]["content"]
-            await ctx.info(f"Generated response with {len(response_text)} characters")
-            return response_text
         else:
-            await ctx.warning("Unexpected response format")
-            return json.dumps(result, indent=2)
-    except json.JSONDecodeError as e:
-        error_msg = f"Invalid JSON format for messages: {str(e)}"
-        await ctx.error(error_msg)
-        return f"Error: {error_msg}"
     except Exception as e:
-        error_msg = f"Unexpected error: {str(e)}"
-        await ctx.error(error_msg)
-        return f"Error: {error_msg}"
-@mcp.resource("file://providers")
-async def get_providers() -> str:
-    """Get the list of available inference providers and their capabilities.
-    Returns JSON information about all supported providers including their
-    supported tasks and base URLs.
-    """
-    return json.dumps(PROVIDERS, indent=2)
-@mcp.resource("file://models/popular")
-async def get_popular_models() -> str:
-    """Get a list of popular models for each provider.
-    Returns curated recommendations for models to try with each provider.
-    """
-    popular_models = {
-        "chat_models": {
-            "cerebras": ["llama3.1-70b"],
-            "cohere": ["command-r-plus"],
-            "groq": ["meta-llama/Llama-3.1-70B-Instruct"],
-            "novita": ["deepseek-ai/DeepSeek-V3-0324"],
-            "together": ["mistralai/Mixtral-8x7B-Instruct-v0.1"],
-            "hf-inference": ["google/gemma-2-27b-it"],
-        },
-        "vision_models": {
-            "cohere": ["command-r-plus-vision"],
-            "together": ["meta-llama/Llama-3.2-11B-Vision-Instruct"],
-            "hf-inference": ["microsoft/Phi-3.5-vision-instruct"],
-        },
-    }
-    return json.dumps(popular_models, indent=2)
-@mcp.prompt()
-def generate_provider_comparison_prompt(task: str = "chat") -> str:
-    """Generate a prompt to help compare different inference providers.
-    Args:
-        task: The type of task to compare providers for (default: "chat")
-    Returns:
-        A prompt that can be used to get comparative analysis of providers
-    """
-    available_providers = [
-        name
-        for name, config in PROVIDERS.items()
-        if f"{task}-completion" in config["tasks"]
     ]
-    providers_list = ", ".join(available_providers)
-    return f"""Please compare the following inference providers for {task} tasks:
-Providers: {providers_list}
-Consider factors like:
-- Model selection and capabilities
-- Performance and speed
-- Pricing (if known)
-- Special features or limitations
-- Use case recommendations
-Provide a balanced comparison that helps choose the right provider."""
 if __name__ == "__main__":
-    # Run the MCP server
-    # Default: stdio transport for local development
-    # For production, use: mcp.run(transport="sse", host="0.0.0.0", port=8000)
-    mcp.run()

+import gradio as gr
 import os
 import requests
+import json
+from typing import List
 # Inference Providers configuration
 PROVIDERS = {
 }
+def chat_completion(
     provider: str,
     model: str,
     messages: str,
     temperature: float = 0.7,
     max_tokens: int = 512,
+):
     """Generate chat completions using Hugging Face Inference Providers.
+    This tool provides access to multiple AI providers and language models
+    through Hugging Face's unified Inference Providers API.
     Args:
+        provider: The inference provider to use. Available providers:
+                 cerebras, cohere, fal-ai, featherless-ai, fireworks-ai,
+                 groq, hf-inference, hyperbolic, nebius, novita, nscale,
+                 replicate, sambanova, together
         model: The model ID from Hugging Face Hub
                (e.g., 'deepseek-ai/DeepSeek-V3-0324')
         messages: Either a JSON array of messages in OpenAI format or
                  plain text for simple queries
         temperature: Controls response randomness (0.0-2.0, default 0.7)
         max_tokens: Maximum tokens in response (1-4096, default 512)
     Returns:
         The generated text response from the language model
     """
+    # Get HF token from environment
+    hf_token = os.getenv("HF_TOKEN")
+    if not hf_token:
+        return (
+            "Error: HF_TOKEN environment variable is required. "
+            "Please set your Hugging Face token."
+        )
+    # Validate provider
+    if provider not in PROVIDERS:
+        available = ", ".join(PROVIDERS.keys())
+        return f"Error: Unknown provider '{provider}'. Available providers: {available}"
     try:
         # Parse messages
         if messages.strip().startswith("["):
             parsed_messages = json.loads(messages)
         else:
             parsed_messages = [{"role": "user", "content": messages}]
+        # Build request payload
         payload = {
             "model": model,
             "messages": parsed_messages,
             "temperature": temperature,
             "max_tokens": max_tokens,
         }
+        # Make request to provider
+        provider_config = PROVIDERS[provider]
+        url = f"{provider_config['base_url']}/v1/chat/completions"
+        headers = {
+            "Authorization": f"Bearer {hf_token}",
+            "Content-Type": "application/json",
+        }
+        response = requests.post(url, headers=headers, json=payload, timeout=60)
+        response.raise_for_status()
+        result = response.json()
+        # Extract response
         if "choices" in result and len(result["choices"]) > 0:
+            return result["choices"][0]["message"]["content"]
         else:
+            return f"Error: Unexpected response format: {json.dumps(result, indent=2)}"
+    except json.JSONDecodeError:
+        return (
+            "Error: Invalid JSON format for messages. "
+            "Use either plain text or valid JSON array."
+        )
+    except requests.exceptions.RequestException as e:
+        return f"Error: Request failed: {str(e)}"
     except Exception as e:
+        return f"Error: {str(e)}"
+def get_providers_for_task(task: str) -> List[str]:
+    """Get available providers for a specific task"""
+    return [
+        provider for provider, config in PROVIDERS.items() if task in config["tasks"]
     ]
+# Create Gradio interface
+with gr.Blocks(title="Inference Providers MCP Server", theme=gr.themes.Soft()) as app:
+    gr.Markdown("""
+    # 🤖 Inference Providers MCP Server
+    A streamlined Model Context Protocol (MCP) server for Hugging Face
+    Inference Providers, providing LLMs with access to multiple AI
+    providers through a simple, focused interface.
+    **Supported Providers:** Cerebras, Cohere, Fal AI, Featherless AI,
+    Fireworks, Groq, HF Inference, Hyperbolic, Nebius, Novita, Nscale,
+    Replicate, SambaNova, Together
+    **Required:** Set HF_TOKEN environment variable with your Hugging Face
+    token that has Inference Providers access.
+    """)
+    # Environment status
+    hf_token_status = "✅ Set" if os.getenv("HF_TOKEN") else "❌ Not Set"
+    gr.Markdown(f"**HF_TOKEN Status:** {hf_token_status}")
+    if not os.getenv("HF_TOKEN"):
+        gr.Markdown("""
+        **⚠️ Setup Required:**
+        1. Get token: [HF Settings](https://huggingface.co/settings/tokens)
+        2. Set environment: `export HF_TOKEN=hf_your_token_here`
+        3. Restart application
+        """)
+    with gr.Tabs():
+        # Chat Completion Tab
+        with gr.Tab("💬 Chat Completion", id="chat"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    chat_provider = gr.Dropdown(
+                        choices=get_providers_for_task("chat-completion"),
+                        label="Provider",
+                        value="novita",
+                        info="Select inference provider",
+                    )
+                    chat_model = gr.Textbox(
+                        label="Model",
+                        value="deepseek-ai/DeepSeek-V3-0324",
+                        placeholder="e.g., deepseek-ai/DeepSeek-V3-0324",
+                        info="Model ID from Hugging Face Hub",
+                    )
+                with gr.Column(scale=2):
+                    chat_messages = gr.Textbox(
+                        label="Messages",
+                        lines=8,
+                        placeholder=(
+                            '[{"role": "user", "content": "Hello!"}]'
+                            "\n\nOr just type directly"
+                        ),
+                        info="JSON array of messages or plain text",
+                    )
+            with gr.Accordion("⚙️ Parameters", open=False):
+                with gr.Row():
+                    chat_temperature = gr.Slider(0.0, 2.0, 0.7, label="Temperature")
+                    chat_max_tokens = gr.Slider(1, 4096, 512, label="Max Tokens")
+            chat_submit = gr.Button("🚀 Generate", variant="primary")
+            chat_output = gr.Textbox(label="Response", lines=10)
+            chat_submit.click(
+                chat_completion,
+                inputs=[
+                    chat_provider,
+                    chat_model,
+                    chat_messages,
+                    chat_temperature,
+                    chat_max_tokens,
+                ],
+                outputs=chat_output,
+            )
+        # MCP Documentation Tab
+        with gr.Tab("🔧 MCP Setup", id="mcp"):
+            gr.Markdown("""
+            ## 🤖 MCP Server Setup
+            This MCP server exposes `chat_completion` tool for LLMs to access
+            Hugging Face Inference Providers.
+            ### 📡 Server URL
+            **Local:** `http://localhost:7860/gradio_api/mcp/sse`
+            **HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse`
+            ### ⚙️ Client Configuration
+            #### Cursor IDE
+            Add to `.cursor/mcp.json`:
+            ```json
+            {
+              "mcpServers": {
+                "inference-providers": {
+                  "url": "YOUR_URL/gradio_api/mcp/sse"
+                }
+              }
+            }
+            ```
+            #### Claude Desktop
+            Add to MCP settings:
+            ```json
+            {
+              "mcpServers": {
+                "inference-providers": {
+                  "command": "npx",
+                  "args": [
+                    "mcp-remote",
+                    "YOUR_URL/gradio_api/mcp/sse",
+                    "--transport", "sse-only"
+                  ]
+                }
+              }
+            }
+            ```
+            ### 🛠️ Tool Details
+            **`chat_completion`** - Generate chat responses
+            **Parameters:**
+            - `provider`: Provider name (novita, groq, etc.)
+            - `model`: Model ID (deepseek-ai/DeepSeek-V3-0324)
+            - `messages`: Input text or JSON messages
+            - `temperature`: Randomness (0.0-2.0, default: 0.7)
+            - `max_tokens`: Max length (1-4096, default: 512)
+            **Environment:** Requires HF_TOKEN
+            ### 🎯 Usage
+            > "Use chat completion with Groq and Llama to explain Python"
+            ### 🔗 Links
+            - [Cursor MCP](https://docs.cursor.com/context/model-context-protocol)
+            - [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp)
+            - [Get HF Token](https://huggingface.co/settings/tokens)
+            """)
 if __name__ == "__main__":
+    # Enable MCP server functionality
+    app.launch(mcp_server=True)

pyproject.toml CHANGED Viewed

@@ -1,11 +1,12 @@
 [project]
 name = "inference-providers-mcp"
 version = "0.1.0"
-description = "FastMCP Server for Hugging Face Inference Providers"
 readme = "README.md"
 requires-python = ">=3.11"
 dependencies = [
-    "fastmcp>=2.0.0",
     "requests>=2.31.0",
-    "python-dotenv>=1.0.0"
 ]

 [project]
 name = "inference-providers-mcp"
 version = "0.1.0"
+description = "MCP Server for Hugging Face Inference Providers Chat Completion"
 readme = "README.md"
 requires-python = ">=3.11"
 dependencies = [
+    "gradio[mcp]>=5.34.0",
+    "huggingface_hub>=0.20.0",
     "requests>=2.31.0",
+    "python-dotenv>=1.0.0",
 ]

requirements.txt CHANGED Viewed

@@ -1,3 +1,4 @@
-fastmcp>=2.0.0
 requests>=2.31.0
 python-dotenv>=1.0.0

+gradio[mcp]>=4.0.0
+huggingface_hub>=0.20.0
 requests>=2.31.0
 python-dotenv>=1.0.0

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff