Spaces:
Running
Running
<!-- Use this file to provide workspace-specific custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file --> | |
# Web Scraper Project Instructions | |
This is a Python Gradio application for web scraping that: | |
- Scrapes text content from websites | |
- Formats content as markdown | |
- Generates sitemaps from page links | |
- Provides MCP (Model Context Protocol) server functionality | |
## Key Libraries | |
- gradio[mcp]: For the web interface and MCP server capabilities | |
- requests: For HTTP requests | |
- beautifulsoup4: For HTML parsing | |
- markdownify: For converting HTML to markdown | |
- urllib.parse: For URL handling | |
## Project Structure | |
- `app.py`: Main web interface application | |
- `mcp_server.py`: MCP server that exposes tools for AI integration | |
## MCP Tools | |
The MCP server exposes three main tools: | |
- `scrape_content`: Extract website content as markdown | |
- `generate_sitemap`: Create sitemap from page links | |
- `analyze_website`: Complete analysis with content and sitemap | |
## Code Style | |
- Use type hints where appropriate | |
- Include proper error handling for web requests | |
- Follow PEP 8 style guidelines | |
- Add docstrings for functions with clear parameter descriptions | |
- MCP functions should have descriptive docstrings as they become tool descriptions | |