ramysaidagieb commited on
Commit
a91e421
·
verified ·
1 Parent(s): a2527fa

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -123
README.md DELETED
@@ -1,123 +0,0 @@
1
- ---
2
- title: Arabic Book Analysis AI
3
- emoji: 📚
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 5.27.1
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
-
14
- # Arabic Book Analysis AI
15
-
16
- This Huggingface Space hosts an AI system that analyzes Arabic books and answers questions based solely on their content. The system supports uploading books in .docx and .pdf formats, fine-tunes an AraBERT model on the extracted text, and provides answers in Arabic.
17
-
18
- ## Features
19
- - Upload multiple books (.docx, .pdf).
20
- - Visual training progress with detailed logs: "Loading book", "Extracting ideas", "Training in progress", "Training completed".
21
- - Question-answering interface in Arabic, restricted to book content.
22
- - Separate tab for question-answering only, shareable with friends.
23
- - Formal answer style, suitable for scholarly texts.
24
-
25
- ## Configuration
26
- To ensure proper deployment on Huggingface Spaces, configure the Space with the following settings:
27
- - **SDK**: Gradio (specified in the YAML front matter above)
28
- - **SDK Version**: 4.31.0 (matches `requirements.txt`)
29
- - **App File**: `app.py` (entry point for the Gradio app)
30
- - **Visibility**: Public (required for sharing the question-answering link)
31
- - **Hardware**: CPU (default free tier is sufficient; upgrade to GPU for faster training if needed)
32
- - **File Structure**:
33
- - Place `app.py`, `requirements.txt`, and `README.md` in the root directory (`/`).
34
- - No additional folders are required; the app expects all files at the root level.
35
- - **Environment Variables**: None required; all dependencies are listed in `requirements.txt`.
36
- - **Persistence**: The fine-tuned model is saved to `./fine_tuned_model` within the Space’s storage for reuse.
37
- - **Huggingface Space Name**: Use `arabic-book-analysis` for consistency with the provided links.
38
-
39
- ## How to Use
40
- ### Training and Question Interface
41
- 1. **Upload Books**:
42
- - Access the main interface at `/` and select the "التدريب والسؤال" tab.
43
- - Click the "رفع الكتب" field to select .docx or .pdf files.
44
- - Press "رفع وتدريب" to start processing and training.
45
- - View uploaded files, training logs, and status.
46
- - After training, see the message: "Training process finished: Enter your question".
47
-
48
- 2. **Ask Questions**:
49
- - In the same tab, enter an Arabic question in the "أدخل سؤالك بالعربية" field.
50
- - Click "طرح السؤال" to get an answer based on the book’s content.
51
-
52
- ### Question-Answering Only Interface
53
- - Share the main link with friends: `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis`
54
- - Users can:
55
- - Select the "طرح الأسئلة فقط" tab.
56
- - Enter an Arabic question in the provided field.
57
- - Receive answers based on the trained model’s knowledge.
58
- - No training required.
59
-
60
- 3. **Example**:
61
- - Question: "ما هو قانون الإيمان وفقًا للكتاب؟"
62
- - Answer: "قانون الإيمان هو أساس العقيدة المسيحية، ويؤمن به كل الكنائس المسيحية في العالم..."
63
-
64
- ## Requirements
65
- - Python 3.8+
66
- - Dependencies listed in `requirements.txt`
67
-
68
- ## Deployment
69
- 1. **Create/Update Huggingface Space**:
70
- - Log in to Huggingface with your username (replace `your_huggingface_username`).
71
- - Create a new Space named "arabic-book-analysis" or update an existing one.
72
- - Select "Gradio" as the SDK and set visibility to "Public".
73
-
74
- 2. **Upload Zipped Folder**:
75
- - Run the provided `create_zip.py` script to generate `arabic_book_analysis.zip`.
76
- - In the Huggingface Space, go to the "Files" tab.
77
- - Upload `arabic_book_analysis.zip` and extract it.
78
- - Move `app.py`, `requirements.txt`, and `README.md` from `/arabic_book_analysis/` to the root directory (`/`).
79
- - Ensure these files are in the root.
80
-
81
- 3. **Build and Launch**:
82
- - Huggingface Spaces will automatically install dependencies from `requirements.txt` and launch the app.
83
- - Monitor build logs for errors.
84
-
85
- 4. **Access Links**:
86
- - Main interface (both tabs): `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis`
87
-
88
- ## Troubleshooting Configuration and Runtime Errors
89
- - **Error: "Invalid SDK"**:
90
- - Ensure the Space is configured to use the Gradio SDK in the Space settings and the YAML front matter specifies `sdk: gradio`.
91
- - **Error: "Files not found"**:
92
- - Verify that `app.py`, `requirements.txt`, and `README.md` are in the root directory (`/`), not in subfolders.
93
- - Re-upload `arabic_book_analysis.zip`, extract it, and move files to the root using the Huggingface UI.
94
- - **Error: "Dependency installation failed"**:
95
- - Check `requirements.txt` for correct package versions.
96
- - Review build logs in the Space’s "Settings" tab for specific errors.
97
- - **Error: "Private Space access denied"**:
98
- - Set the Space visibility to "Public" to enable the question-answering tab for friends.
99
- - **Error: "Model not found"**:
100
- - Ensure training is completed at least once to save the fine-tuned model to `./fine_tuned_model`.
101
- - **Error: "FileNotFoundError: No such file or directory: 'java'"**:
102
- - The `arabert` library’s Farasa dependency requires Java, which is not installed by default. This project uses a custom `preprocess_arabic_text` function in `app.py` to avoid Farasa and Java requirements. If this error occurs, ensure `app.py` uses `preprocess_arabic_text` instead of `ArabertPreprocessor`.
103
- - **Error: "TypeError: ArabertPreprocessor.__init__() got an unexpected keyword argument 'use_farasapy'"**:
104
- - This occurred because `arabert==1.0.1` does not support the `use_farasapy` parameter. The current `app.py` avoids `ArabertPreprocessor` entirely, using a custom preprocessing function.
105
- - **Error: "FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0"**:
106
- - This warning from `huggingface_hub` (used by `transformers`) may cause the container to exit if warnings are treated as errors. The current `app.py` suppresses this warning using a `warnings.filterwarnings` directive. If the error persists, check build logs for additional issues during model/tokenizer loading.
107
- - **Error: "TypeError: App.create_app() missing 1 required positional argument: 'blocks'"**:
108
- - This occurred due to incorrect Gradio app setup in `app.py`. The current `app.py` uses a single `gr.Blocks` app with tabs, launched directly with `demo.launch()`, avoiding manual app mounting.
109
- - **Warning: "Some weights of BertForQuestionAnswering were not initialized"**:
110
- - This warning appears when loading `aubmindlab/bert-base-arabertv2` for question-answering, as the model is not pre-fine-tuned for this task. It is expected and harmless, as the model will be fine-tuned during training. Ensure training is completed to initialize these weights.
111
-
112
- ## Notes
113
- - Ensure books are in Arabic for accurate processing.
114
- - The system is optimized for Huggingface Spaces’ free tier; training may take a few minutes.
115
- - Replace `your_huggingface_username` with your actual Huggingface username.
116
- - Training logs are displayed in the interface for transparency.
117
- - The zipped folder simplifies uploads; ensure files are moved to the root directory after extraction.
118
- - The `arabert` library’s Farasa dependency is bypassed using a custom `preprocess_arabic_text` function to avoid Java requirements. This may slightly reduce preprocessing capabilities but ensures compatibility with Huggingface Spaces.
119
- - The `huggingface_hub` warning is suppressed in `app.py`. Future updates to `transformers` may resolve this when `huggingface_hub>=1.0.0` is released.
120
- - The model weights warning is normal and does not affect functionality after training.
121
-
122
- ## License
123
- MIT License