crawlitall / README.md
hellorahulk's picture
Update README.md
a09426f verified
metadata
title: Crawl4AI Web Content Extractor
emoji: 🕷️
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

Crawl4AI Demo - Docker Deployment

This is a Docker-ready version of the Crawl4AI demo application, specifically designed for deployment on Hugging Face Spaces.

Features

  • Web interface built with Gradio
  • Support for multiple crawler types (Basic, LLM, Cosine, JSON/CSS)
  • Configurable word count threshold
  • Markdown output with metadata
  • Sub-page crawling capabilities
  • Lazy loading support
  • Docker-optimized configuration

Deployment Instructions

  1. Create a new Space on Hugging Face:

    • Go to huggingface.co/spaces
    • Click "Create new Space"
    • Choose "Docker" as the SDK
    • Set the hardware requirements (recommended: CPU + 16GB RAM)
  2. Upload the files:

    • Upload all files from this directory to your Space
    • Make sure to include:
      • Dockerfile
      • app.py
      • requirements.txt
      • README.md
  3. The Space will automatically build and deploy the application.

Environment Variables

No environment variables are required for basic functionality. The application is configured to run out of the box.

Hardware Requirements

  • CPU: 2+ cores recommended
  • RAM: 16GB recommended
  • Disk: 5GB minimum

Browser Support

The application uses Chrome in headless mode for web crawling. The Dockerfile includes all necessary dependencies.

Limitations

  • Memory usage increases with the number of pages crawled
  • Some websites may block automated crawling
  • JavaScript-heavy sites may require additional configuration

Troubleshooting

If you encounter issues:

  1. Check the Space logs for error messages
  2. Ensure the Chrome browser is running correctly
  3. Verify network connectivity
  4. Check memory usage

Development

To run locally with Docker:

docker build -t crawl4ai-demo .
docker run -p 7860:7860 crawl4ai-demo

Visit http://localhost:7860 to access the application.

License

This project is licensed under the MIT License - see the LICENSE file for details.