Spaces:
Configuration error
Configuration error
Delete README.md
Browse files
README.md
DELETED
@@ -1,123 +0,0 @@
|
|
1 |
-
---
|
2 |
-
title: Arabic Book Analysis AI
|
3 |
-
emoji: 📚
|
4 |
-
colorFrom: blue
|
5 |
-
colorTo: green
|
6 |
-
sdk: gradio
|
7 |
-
sdk_version: 5.27.1
|
8 |
-
app_file: app.py
|
9 |
-
pinned: false
|
10 |
-
---
|
11 |
-
|
12 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
13 |
-
|
14 |
-
# Arabic Book Analysis AI
|
15 |
-
|
16 |
-
This Huggingface Space hosts an AI system that analyzes Arabic books and answers questions based solely on their content. The system supports uploading books in .docx and .pdf formats, fine-tunes an AraBERT model on the extracted text, and provides answers in Arabic.
|
17 |
-
|
18 |
-
## Features
|
19 |
-
- Upload multiple books (.docx, .pdf).
|
20 |
-
- Visual training progress with detailed logs: "Loading book", "Extracting ideas", "Training in progress", "Training completed".
|
21 |
-
- Question-answering interface in Arabic, restricted to book content.
|
22 |
-
- Separate tab for question-answering only, shareable with friends.
|
23 |
-
- Formal answer style, suitable for scholarly texts.
|
24 |
-
|
25 |
-
## Configuration
|
26 |
-
To ensure proper deployment on Huggingface Spaces, configure the Space with the following settings:
|
27 |
-
- **SDK**: Gradio (specified in the YAML front matter above)
|
28 |
-
- **SDK Version**: 4.31.0 (matches `requirements.txt`)
|
29 |
-
- **App File**: `app.py` (entry point for the Gradio app)
|
30 |
-
- **Visibility**: Public (required for sharing the question-answering link)
|
31 |
-
- **Hardware**: CPU (default free tier is sufficient; upgrade to GPU for faster training if needed)
|
32 |
-
- **File Structure**:
|
33 |
-
- Place `app.py`, `requirements.txt`, and `README.md` in the root directory (`/`).
|
34 |
-
- No additional folders are required; the app expects all files at the root level.
|
35 |
-
- **Environment Variables**: None required; all dependencies are listed in `requirements.txt`.
|
36 |
-
- **Persistence**: The fine-tuned model is saved to `./fine_tuned_model` within the Space’s storage for reuse.
|
37 |
-
- **Huggingface Space Name**: Use `arabic-book-analysis` for consistency with the provided links.
|
38 |
-
|
39 |
-
## How to Use
|
40 |
-
### Training and Question Interface
|
41 |
-
1. **Upload Books**:
|
42 |
-
- Access the main interface at `/` and select the "التدريب والسؤال" tab.
|
43 |
-
- Click the "رفع الكتب" field to select .docx or .pdf files.
|
44 |
-
- Press "رفع وتدريب" to start processing and training.
|
45 |
-
- View uploaded files, training logs, and status.
|
46 |
-
- After training, see the message: "Training process finished: Enter your question".
|
47 |
-
|
48 |
-
2. **Ask Questions**:
|
49 |
-
- In the same tab, enter an Arabic question in the "أدخل سؤالك بالعربية" field.
|
50 |
-
- Click "طرح السؤال" to get an answer based on the book’s content.
|
51 |
-
|
52 |
-
### Question-Answering Only Interface
|
53 |
-
- Share the main link with friends: `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis`
|
54 |
-
- Users can:
|
55 |
-
- Select the "طرح الأسئلة فقط" tab.
|
56 |
-
- Enter an Arabic question in the provided field.
|
57 |
-
- Receive answers based on the trained model’s knowledge.
|
58 |
-
- No training required.
|
59 |
-
|
60 |
-
3. **Example**:
|
61 |
-
- Question: "ما هو قانون الإيمان وفقًا للكتاب؟"
|
62 |
-
- Answer: "قانون الإيمان هو أساس العقيدة المسيحية، ويؤمن به كل الكنائس المسيحية في العالم..."
|
63 |
-
|
64 |
-
## Requirements
|
65 |
-
- Python 3.8+
|
66 |
-
- Dependencies listed in `requirements.txt`
|
67 |
-
|
68 |
-
## Deployment
|
69 |
-
1. **Create/Update Huggingface Space**:
|
70 |
-
- Log in to Huggingface with your username (replace `your_huggingface_username`).
|
71 |
-
- Create a new Space named "arabic-book-analysis" or update an existing one.
|
72 |
-
- Select "Gradio" as the SDK and set visibility to "Public".
|
73 |
-
|
74 |
-
2. **Upload Zipped Folder**:
|
75 |
-
- Run the provided `create_zip.py` script to generate `arabic_book_analysis.zip`.
|
76 |
-
- In the Huggingface Space, go to the "Files" tab.
|
77 |
-
- Upload `arabic_book_analysis.zip` and extract it.
|
78 |
-
- Move `app.py`, `requirements.txt`, and `README.md` from `/arabic_book_analysis/` to the root directory (`/`).
|
79 |
-
- Ensure these files are in the root.
|
80 |
-
|
81 |
-
3. **Build and Launch**:
|
82 |
-
- Huggingface Spaces will automatically install dependencies from `requirements.txt` and launch the app.
|
83 |
-
- Monitor build logs for errors.
|
84 |
-
|
85 |
-
4. **Access Links**:
|
86 |
-
- Main interface (both tabs): `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis`
|
87 |
-
|
88 |
-
## Troubleshooting Configuration and Runtime Errors
|
89 |
-
- **Error: "Invalid SDK"**:
|
90 |
-
- Ensure the Space is configured to use the Gradio SDK in the Space settings and the YAML front matter specifies `sdk: gradio`.
|
91 |
-
- **Error: "Files not found"**:
|
92 |
-
- Verify that `app.py`, `requirements.txt`, and `README.md` are in the root directory (`/`), not in subfolders.
|
93 |
-
- Re-upload `arabic_book_analysis.zip`, extract it, and move files to the root using the Huggingface UI.
|
94 |
-
- **Error: "Dependency installation failed"**:
|
95 |
-
- Check `requirements.txt` for correct package versions.
|
96 |
-
- Review build logs in the Space’s "Settings" tab for specific errors.
|
97 |
-
- **Error: "Private Space access denied"**:
|
98 |
-
- Set the Space visibility to "Public" to enable the question-answering tab for friends.
|
99 |
-
- **Error: "Model not found"**:
|
100 |
-
- Ensure training is completed at least once to save the fine-tuned model to `./fine_tuned_model`.
|
101 |
-
- **Error: "FileNotFoundError: No such file or directory: 'java'"**:
|
102 |
-
- The `arabert` library’s Farasa dependency requires Java, which is not installed by default. This project uses a custom `preprocess_arabic_text` function in `app.py` to avoid Farasa and Java requirements. If this error occurs, ensure `app.py` uses `preprocess_arabic_text` instead of `ArabertPreprocessor`.
|
103 |
-
- **Error: "TypeError: ArabertPreprocessor.__init__() got an unexpected keyword argument 'use_farasapy'"**:
|
104 |
-
- This occurred because `arabert==1.0.1` does not support the `use_farasapy` parameter. The current `app.py` avoids `ArabertPreprocessor` entirely, using a custom preprocessing function.
|
105 |
-
- **Error: "FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0"**:
|
106 |
-
- This warning from `huggingface_hub` (used by `transformers`) may cause the container to exit if warnings are treated as errors. The current `app.py` suppresses this warning using a `warnings.filterwarnings` directive. If the error persists, check build logs for additional issues during model/tokenizer loading.
|
107 |
-
- **Error: "TypeError: App.create_app() missing 1 required positional argument: 'blocks'"**:
|
108 |
-
- This occurred due to incorrect Gradio app setup in `app.py`. The current `app.py` uses a single `gr.Blocks` app with tabs, launched directly with `demo.launch()`, avoiding manual app mounting.
|
109 |
-
- **Warning: "Some weights of BertForQuestionAnswering were not initialized"**:
|
110 |
-
- This warning appears when loading `aubmindlab/bert-base-arabertv2` for question-answering, as the model is not pre-fine-tuned for this task. It is expected and harmless, as the model will be fine-tuned during training. Ensure training is completed to initialize these weights.
|
111 |
-
|
112 |
-
## Notes
|
113 |
-
- Ensure books are in Arabic for accurate processing.
|
114 |
-
- The system is optimized for Huggingface Spaces’ free tier; training may take a few minutes.
|
115 |
-
- Replace `your_huggingface_username` with your actual Huggingface username.
|
116 |
-
- Training logs are displayed in the interface for transparency.
|
117 |
-
- The zipped folder simplifies uploads; ensure files are moved to the root directory after extraction.
|
118 |
-
- The `arabert` library’s Farasa dependency is bypassed using a custom `preprocess_arabic_text` function to avoid Java requirements. This may slightly reduce preprocessing capabilities but ensures compatibility with Huggingface Spaces.
|
119 |
-
- The `huggingface_hub` warning is suppressed in `app.py`. Future updates to `transformers` may resolve this when `huggingface_hub>=1.0.0` is released.
|
120 |
-
- The model weights warning is normal and does not affect functionality after training.
|
121 |
-
|
122 |
-
## License
|
123 |
-
MIT License
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|