pope3 / README.md
ramysaidagieb's picture
Upload 3 files
2b4974e verified
|
raw
history blame
6.71 kB
metadata
title: Arabic Book Analysis AI
emoji: 📚
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Arabic Book Analysis AI

This Huggingface Space hosts an AI system that analyzes Arabic books and answers questions based solely on their content. The system supports uploading books in .docx and .pdf formats, fine-tunes an AraBERT model on the extracted text, and provides answers in Arabic.

Features

  • Upload multiple books (.docx, .pdf).
  • Visual training progress with detailed logs: "Loading book", "Extracting ideas", "Training in progress", "Training completed".
  • Question-answering interface in Arabic, restricted to book content.
  • Separate interface for question-answering only, shareable with friends.
  • Formal answer style, suitable for scholarly texts.

Configuration

To ensure proper deployment on Huggingface Spaces, configure the Space with the following settings:

  • SDK: Gradio (specified in the YAML front matter above)
  • SDK Version: 4.31.0 (matches requirements.txt)
  • App File: app.py (entry point for the Gradio app)
  • Visibility: Public (required for sharing the question-answering link)
  • Hardware: CPU (default free tier is sufficient; upgrade to GPU for faster training if needed)
  • File Structure:
    • Place app.py, requirements.txt, and README.md in the root directory (/).
    • No additional folders are required; the app expects all files at the root level.
  • Environment Variables: None required; all dependencies are listed in requirements.txt.
  • Persistence: The fine-tuned model is saved to ./fine_tuned_model within the Space’s storage for reuse.
  • Huggingface Space Name: Use arabic-book-analysis for consistency with the provided links.

How to Use

Training Interface

  1. Upload Books:

    • Access the main interface at /.
    • Click the "رفع الكتب" field to select .docx or .pdf files.
    • Press "رفع وتدريب" to start processing and training.
    • View uploaded files, training logs, and status.
    • After training, see the message: "Training process finished: Enter your question".
  2. Ask Questions:

    • Enter an Arabic question in the "أدخل سؤالك بالعربية" field.
    • Click "طرح السؤال" to get an answer based on the book’s content.

Question-Answering Only Interface

  • Share this link with friends: https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis/answer
  • Users can:
    • Enter an Arabic question.
    • Receive answers based on the trained model’s knowledge.
    • No training required.
  1. Example:
    • Question: "ما هو قانون الإيمان وفقًا للكتاب؟"
    • Answer: "قانون الإيمان هو أساس العقيدة المسيحية، ويؤمن به كل الكنائس المسيحية في العالم..."

Requirements

  • Python 3.8+
  • Dependencies listed in requirements.txt

Deployment

  1. Create/Update Huggingface Space:

    • Log in to Huggingface with your username (replace your_huggingface_username).
    • Create a new Space named "arabic-book-analysis" or update an existing one.
    • Select "Gradio" as the SDK and set visibility to "Public".
  2. Upload Zipped Folder:

    • Run the provided create_zip.py script to generate arabic_book_analysis.zip.
    • In the Huggingface Space, go to the "Files" tab.
    • Upload arabic_book_analysis.zip and extract it.
    • Move app.py, requirements.txt, and README.md from /arabic_book_analysis/ to the root directory (/).
    • Ensure these files are in the root.
  3. Build and Launch:

    • Huggingface Spaces will automatically install dependencies from requirements.txt and launch the app.
    • Monitor build logs for errors.
  4. Access Links:

    • Main interface: https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis
    • Question-answering interface: https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis/answer

Troubleshooting Configuration and Runtime Errors

  • Error: "Invalid SDK":
    • Ensure the Space is configured to use the Gradio SDK in the Space settings and the YAML front matter specifies sdk: gradio.
  • Error: "Files not found":
    • Verify that app.py, requirements.txt, and README.md are in the root directory (/), not in subfolders.
    • Re-upload arabic_book_analysis.zip, extract it, and move files to the root using the Huggingface UI.
  • Error: "Dependency installation failed":
    • Check requirements.txt for correct package versions.
    • Review build logs in the Space’s "Settings" tab for specific errors.
  • Error: "Private Space access denied":
    • Set the Space visibility to "Public" to enable the question-answering link for friends.
  • Error: "Model not found":
    • Ensure training is completed at least once to save the fine-tuned model to ./fine_tuned_model.
  • Error: "FileNotFoundError: No such file or directory: 'java'":
    • The arabert library’s Farasa dependency requires Java, which is not installed by default. This project uses a custom preprocess_arabic_text function in app.py to avoid Farasa and Java requirements. If this error occurs, ensure app.py uses preprocess_arabic_text instead of ArabertPreprocessor.
  • Error: "TypeError: ArabertPreprocessor.init() got an unexpected keyword argument 'use_farasapy'":
    • This occurred because arabert==1.0.1 does not support the use_farasapy parameter. The current app.py avoids ArabertPreprocessor entirely, using a custom preprocessing function.
  • Persistent Configuration Error:
    • If the "Missing configuration in README" error persists, double-check that README.md is in the root directory and contains the exact YAML front matter shown above. Ensure the zip file is extracted correctly, move files to the root, and restart the Space build.

Notes

  • Ensure books are in Arabic for accurate processing.
  • The system is optimized for Huggingface Spaces’ free tier; training may take a few minutes.
  • Replace your_huggingface_username with your actual Huggingface username.
  • Training logs are displayed in the interface for transparency.
  • The zipped folder simplifies uploads; ensure files are moved to the root directory after extraction.
  • The arabert library’s Farasa dependency is bypassed using a custom preprocess_arabic_text function to avoid Java requirements. This may slightly reduce preprocessing capabilities but ensures compatibility with Huggingface Spaces.

License

MIT License