crawlitall / README.md
hellorahulk's picture
Update README.md
a09426f verified
---
title: Crawl4AI Web Content Extractor
emoji: 🕷️
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# Crawl4AI Demo - Docker Deployment
This is a Docker-ready version of the Crawl4AI demo application, specifically designed for deployment on Hugging Face Spaces.
## Features
- Web interface built with Gradio
- Support for multiple crawler types (Basic, LLM, Cosine, JSON/CSS)
- Configurable word count threshold
- Markdown output with metadata
- Sub-page crawling capabilities
- Lazy loading support
- Docker-optimized configuration
## Deployment Instructions
1. Create a new Space on Hugging Face:
- Go to huggingface.co/spaces
- Click "Create new Space"
- Choose "Docker" as the SDK
- Set the hardware requirements (recommended: CPU + 16GB RAM)
2. Upload the files:
- Upload all files from this directory to your Space
- Make sure to include:
- `Dockerfile`
- `app.py`
- `requirements.txt`
- `README.md`
3. The Space will automatically build and deploy the application.
## Environment Variables
No environment variables are required for basic functionality. The application is configured to run out of the box.
## Hardware Requirements
- CPU: 2+ cores recommended
- RAM: 16GB recommended
- Disk: 5GB minimum
## Browser Support
The application uses Chrome in headless mode for web crawling. The Dockerfile includes all necessary dependencies.
## Limitations
- Memory usage increases with the number of pages crawled
- Some websites may block automated crawling
- JavaScript-heavy sites may require additional configuration
## Troubleshooting
If you encounter issues:
1. Check the Space logs for error messages
2. Ensure the Chrome browser is running correctly
3. Verify network connectivity
4. Check memory usage
## Development
To run locally with Docker:
```bash
docker build -t crawl4ai-demo .
docker run -p 7860:7860 crawl4ai-demo
```
Visit http://localhost:7860 to access the application.
## License
This project is licensed under the MIT License - see the LICENSE file for details.