Spaces:
Configuration error
Configuration error
metadata
title: Arabic Book Analysis AI
emoji: 📚
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Arabic Book Analysis AI
This Huggingface Space hosts an AI system that analyzes Arabic books and answers questions based solely on their content. The system supports uploading books in .docx and .pdf formats, fine-tunes an AraBERT model on the extracted text, and provides answers in Arabic.
Features
- Upload multiple books (.docx, .pdf).
- Visual training progress with detailed logs: "Loading book", "Extracting ideas", "Training in progress", "Training completed".
- Question-answering interface in Arabic, restricted to book content.
- Separate interface for question-answering only, shareable with friends.
- Formal answer style, suitable for scholarly texts.
Configuration
To ensure proper deployment on Huggingface Spaces, configure the Space with the following settings:
- SDK: Gradio (specified in the YAML front matter above)
- SDK Version: 4.31.0 (matches
requirements.txt
) - App File:
app.py
(entry point for the Gradio app) - Visibility: Public (required for sharing the question-answering link)
- Hardware: CPU (default free tier is sufficient; upgrade to GPU for faster training if needed)
- File Structure:
- Place
app.py
,requirements.txt
, andREADME.md
in the root directory (/
). - No additional folders are required; the app expects all files at the root level.
- Place
- Environment Variables: None required; all dependencies are listed in
requirements.txt
. - Persistence: The fine-tuned model is saved to
./fine_tuned_model
within the Space’s storage for reuse. - Huggingface Space Name: Use
arabic-book-analysis
for consistency with the provided links.
How to Use
Training Interface
Upload Books:
- Access the main interface at
/
. - Click the "رفع الكتب" field to select .docx or .pdf files.
- Press "رفع وتدريب" to start processing and training.
- View uploaded files, training logs, and status.
- After training, see the message: "Training process finished: Enter your question".
- Access the main interface at
Ask Questions:
- Enter an Arabic question in the "أدخل سؤالك بالعربية" field.
- Click "طرح السؤال" to get an answer based on the book’s content.
Question-Answering Only Interface
- Share this link with friends:
https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis/answer
- Users can:
- Enter an Arabic question.
- Receive answers based on the trained model’s knowledge.
- No training required.
- Example:
- Question: "ما هو قانون الإيمان وفقًا للكتاب؟"
- Answer: "قانون الإيمان هو أساس العقيدة المسيحية، ويؤمن به كل الكنائس المسيحية في العالم..."
Requirements
- Python 3.8+
- Dependencies listed in
requirements.txt
Deployment
Create/Update Huggingface Space:
- Log in to Huggingface with your username (replace
your_huggingface_username
). - Create a new Space named "arabic-book-analysis" or update an existing one.
- Select "Gradio" as the SDK and set visibility to "Public".
- Log in to Huggingface with your username (replace
Upload Zipped Folder:
- Run the provided
create_zip.py
script to generatearabic_book_analysis.zip
. - In the Huggingface Space, go to the "Files" tab.
- Upload
arabic_book_analysis.zip
and extract it. - Move
app.py
,requirements.txt
, andREADME.md
from/arabic_book_analysis/
to the root directory (/
). - Ensure these files are in the root.
- Run the provided
Build and Launch:
- Huggingface Spaces will automatically install dependencies from
requirements.txt
and launch the app. - Monitor build logs for errors.
- Huggingface Spaces will automatically install dependencies from
Access Links:
- Main interface:
https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis
- Question-answering interface:
https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis/answer
- Main interface:
Troubleshooting Configuration and Runtime Errors
- Error: "Invalid SDK":
- Ensure the Space is configured to use the Gradio SDK in the Space settings and the YAML front matter specifies
sdk: gradio
.
- Ensure the Space is configured to use the Gradio SDK in the Space settings and the YAML front matter specifies
- Error: "Files not found":
- Verify that
app.py
,requirements.txt
, andREADME.md
are in the root directory (/
), not in subfolders. - Re-upload
arabic_book_analysis.zip
, extract it, and move files to the root using the Huggingface UI.
- Verify that
- Error: "Dependency installation failed":
- Check
requirements.txt
for correct package versions. - Review build logs in the Space’s "Settings" tab for specific errors.
- Check
- Error: "Private Space access denied":
- Set the Space visibility to "Public" to enable the question-answering link for friends.
- Error: "Model not found":
- Ensure training is completed at least once to save the fine-tuned model to
./fine_tuned_model
.
- Ensure training is completed at least once to save the fine-tuned model to
- Error: "FileNotFoundError: No such file or directory: 'java'":
- The
arabert
library’s Farasa dependency requires Java, which is not installed by default. This project uses a custompreprocess_arabic_text
function inapp.py
to avoid Farasa and Java requirements. If this error occurs, ensureapp.py
usespreprocess_arabic_text
instead ofArabertPreprocessor
.
- The
- Error: "TypeError: ArabertPreprocessor.init() got an unexpected keyword argument 'use_farasapy'":
- This occurred because
arabert==1.0.1
does not support theuse_farasapy
parameter. The currentapp.py
avoidsArabertPreprocessor
entirely, using a custom preprocessing function.
- This occurred because
- Persistent Configuration Error:
- If the "Missing configuration in README" error persists, double-check that
README.md
is in the root directory and contains the exact YAML front matter shown above. Ensure the zip file is extracted correctly, move files to the root, and restart the Space build.
- If the "Missing configuration in README" error persists, double-check that
Notes
- Ensure books are in Arabic for accurate processing.
- The system is optimized for Huggingface Spaces’ free tier; training may take a few minutes.
- Replace
your_huggingface_username
with your actual Huggingface username. - Training logs are displayed in the interface for transparency.
- The zipped folder simplifies uploads; ensure files are moved to the root directory after extraction.
- The
arabert
library’s Farasa dependency is bypassed using a custompreprocess_arabic_text
function to avoid Java requirements. This may slightly reduce preprocessing capabilities but ensures compatibility with Huggingface Spaces.
License
MIT License