ramysaidagieb commited on
Commit
767177a
·
verified ·
1 Parent(s): 27e9342

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -118
README.md DELETED
@@ -1,118 +0,0 @@
1
- ---
2
- title: Arabic Book Analysis AI
3
- emoji: 📚
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 5.27.1
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
-
14
- # Arabic Book Analysis AI
15
-
16
- This Huggingface Space hosts an AI system that analyzes Arabic books and answers questions based solely on their content. The system supports uploading books in .docx and .pdf formats, fine-tunes an AraBERT model on the extracted text, and provides answers in Arabic.
17
-
18
- ## Features
19
- - Upload multiple books (.docx, .pdf).
20
- - Visual training progress with detailed logs: "Loading book", "Extracting ideas", "Training in progress", "Training completed".
21
- - Question-answering interface in Arabic, restricted to book content.
22
- - Separate interface for question-answering only, shareable with friends.
23
- - Formal answer style, suitable for scholarly texts.
24
-
25
- ## Configuration
26
- To ensure proper deployment on Huggingface Spaces, configure the Space with the following settings:
27
- - **SDK**: Gradio (specified in the YAML front matter above)
28
- - **SDK Version**: 4.31.0 (matches `requirements.txt`)
29
- - **App File**: `app.py` (entry point for the Gradio app)
30
- - **Visibility**: Public (required for sharing the question-answering link)
31
- - **Hardware**: CPU (default free tier is sufficient; upgrade to GPU for faster training if needed)
32
- - **File Structure**:
33
- - Place `app.py`, `requirements.txt`, and `README.md` in the root directory (`/`).
34
- - No additional folders are required; the app expects all files at the root level.
35
- - **Environment Variables**: None required; all dependencies are listed in `requirements.txt`.
36
- - **Persistence**: The fine-tuned model is saved to `./fine_tuned_model` within the Space’s storage for reuse.
37
- - **Huggingface Space Name**: Use `arabic-book-analysis` for consistency with the provided links.
38
-
39
- ## How to Use
40
- ### Training Interface
41
- 1. **Upload Books**:
42
- - Access the main interface at `/`.
43
- - Click the "رفع الكتب" field to select .docx or .pdf files.
44
- - Press "رفع وتدريب" to start processing and training.
45
- - View uploaded files, training logs, and status.
46
- - After training, see the message: "Training process finished: Enter your question".
47
-
48
- 2. **Ask Questions**:
49
- - Enter an Arabic question in the "أدخل سؤالك بالعربية" field.
50
- - Click "طرح السؤال" to get an answer based on the book’s content.
51
-
52
- ### Question-Answering Only Interface
53
- - Share this link with friends: `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis/answer`
54
- - Users can:
55
- - Enter an Arabic question.
56
- - Receive answers based on the trained model’s knowledge.
57
- - No training required.
58
-
59
- 3. **Example**:
60
- - Question: "ما هو قانون الإيمان وفقًا للكتاب؟"
61
- - Answer: "قانون الإيمان هو أساس العقيدة المسيحية، ويؤمن به كل الكنائس المسيحية في العالم..."
62
-
63
- ## Requirements
64
- - Python 3.8+
65
- - Dependencies listed in `requirements.txt`
66
-
67
- ## Deployment
68
- 1. **Create/Update Huggingface Space**:
69
- - Log in to Huggingface with your username (replace `your_huggingface_username`).
70
- - Create a new Space named "arabic-book-analysis" or update an existing one.
71
- - Select "Gradio" as the SDK and set visibility to "Public".
72
-
73
- 2. **Upload Zipped Folder**:
74
- - Run the provided `create_zip.py` script to generate `arabic_book_analysis.zip`.
75
- - In the Huggingface Space, go to the "Files" tab.
76
- - Upload `arabic_book_analysis.zip` and extract it.
77
- - Move `app.py`, `requirements.txt`, and `README.md` from `/arabic_book_analysis/` to the root directory (`/`).
78
- - Ensure these files are in the root.
79
-
80
- 3. **Build and Launch**:
81
- - Huggingface Spaces will automatically install dependencies from `requirements.txt` and launch the app.
82
- - Monitor build logs for errors.
83
-
84
- 4. **Access Links**:
85
- - Main interface: `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis`
86
- - Question-answering interface: `https://huggingface.co/spaces/your_huggingface_username/arabic-book-analysis/answer`
87
-
88
- ## Troubleshooting Configuration and Runtime Errors
89
- - **Error: "Invalid SDK"**:
90
- - Ensure the Space is configured to use the Gradio SDK in the Space settings and the YAML front matter specifies `sdk: gradio`.
91
- - **Error: "Files not found"**:
92
- - Verify that `app.py`, `requirements.txt`, and `README.md` are in the root directory (`/`), not in subfolders.
93
- - Re-upload `arabic_book_analysis.zip`, extract it, and move files to the root using the Huggingface UI.
94
- - **Error: "Dependency installation failed"**:
95
- - Check `requirements.txt` for correct package versions.
96
- - Review build logs in the Space’s "Settings" tab for specific errors.
97
- - **Error: "Private Space access denied"**:
98
- - Set the Space visibility to "Public" to enable the question-answering link for friends.
99
- - **Error: "Model not found"**:
100
- - Ensure training is completed at least once to save the fine-tuned model to `./fine_tuned_model`.
101
- - **Error: "FileNotFoundError: No such file or directory: 'java'"**:
102
- - The `arabert` library’s Farasa dependency requires Java, which is not installed by default. This project uses a custom `preprocess_arabic_text` function in `app.py` to avoid Farasa and Java requirements. If this error occurs, ensure `app.py` uses `preprocess_arabic_text` instead of `ArabertPreprocessor`.
103
- - **Error: "TypeError: ArabertPreprocessor.__init__() got an unexpected keyword argument 'use_farasapy'"**:
104
- - This occurred because `arabert==1.0.1` does not support the `use_farasapy` parameter. The current `app.py` avoids `ArabertPreprocessor` entirely, using a custom preprocessing function.
105
- - **Error: "FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0"**:
106
- - This warning from `huggingface_hub` (used by `transformers`) may cause the container to exit if warnings are treated as errors. The current `app.py` suppresses this warning using a `warnings.filterwarnings` directive. If the error persists, check build logs for additional issues during model/tokenizer loading.
107
-
108
- ## Notes
109
- - Ensure books are in Arabic for accurate processing.
110
- - The system is optimized for Huggingface Spaces’ free tier; training may take a few minutes.
111
- - Replace `your_huggingface_username` with your actual Huggingface username.
112
- - Training logs are displayed in the interface for transparency.
113
- - The zipped folder simplifies uploads; ensure files are moved to the root directory after extraction.
114
- - The `arabert` library’s Farasa dependency is bypassed using a custom `preprocess_arabic_text` function to avoid Java requirements. This may slightly reduce preprocessing capabilities but ensures compatibility with Huggingface Spaces.
115
- - The `huggingface_hub` warning is non-critical but may trigger container failure in strict environments. The warning is suppressed in `app.py`, but future updates to `transformers` may resolve this when `huggingface_hub>=1.0.0` is released.
116
-
117
- ## License
118
- MIT License