File size: 3,890 Bytes
aacf53d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ea290d
aacf53d
8ea290d
aacf53d
 
 
 
 
 
 
 
8ea290d
aacf53d
 
8ea290d
aacf53d
 
 
 
 
8ea290d
 
aacf53d
 
 
8ea290d
 
 
 
 
 
 
 
 
aacf53d
 
 
8ea290d
 
 
 
 
 
 
 
 
 
 
aacf53d
 
8ea290d
 
aacf53d
 
 
 
 
 
 
 
 
 
 
 
8ea290d
aacf53d
 
 
 
 
 
 
 
8ea290d
aacf53d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# Troubleshooting Guide

This document provides solutions for common issues encountered when running the Toxic Eye application.

## Gradio Version Compatibility

Ensure that you're using Gradio version 5.23.2 as specified in the project's `README.md` file:

```bash
pip install gradio==5.23.2
```

You can check your current Gradio version with:

```bash
pip show gradio
```

If you're running on HuggingFace Spaces, check that the `sdk_version` in the README.md frontmatter is set to 5.23.2:

```yaml
sdk: gradio
sdk_version: 5.23.2
```

Using older or newer versions might cause unexpected behavior with the Spaces GPU integration.

## GPU Acceleration Issues

### spaces.GPU Decorator Issues

We've observed that the `spaces.GPU` decorator may not work correctly when used with methods inside a class. This can lead to errors like:

```
HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found"
Error in text generation: 'GPU task aborted'
```

### Solution

1. The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work:

   ```python
   @spaces.GPU
   def generate_text(model_path, text):
       # ...
   ```

   ```python
   @spaces.GPU()
   def generate_text(model_path, text):
       # ...
   ```

   If you need to specify a duration for longer GPU operations, use parentheses:
   
   ```python
   @spaces.GPU(duration=120)  # Set 120-second duration
   def generate_long_text(model_path, text):
       # ...
   ```

2. Use standalone functions instead of class methods with spaces.GPU:

   **Problematic:**
   ```python
   class ModelManager:
       @spaces.GPU
       def generate_text(self, model_path, text):  # Class method doesn't work well
           # ...
   ```

   **Recommended:**
   ```python
   @spaces.GPU
   def generate_text_local(model_path, text):  # Standalone function
       # ...
   ```

3. Use direct pipeline creation instead of loading model and tokenizer separately:

   **Recommended:**
   ```python
   tokenizer = AutoTokenizer.from_pretrained(model_path)
   pipe = pipeline(
       "text-generation",
       model=model_path,  # Pass the model ID/path directly
       tokenizer=tokenizer,
       torch_dtype=torch.bfloat16,
       device_map="auto"
   )
   ```

4. Use synchronous `InferenceClient` instead of `AsyncInferenceClient` for API calls:

   **Recommended:**
   ```python
   from huggingface_hub import InferenceClient
   client = InferenceClient(model_id)
   response = client.text_generation(text)  # Synchronous call
   ```

5. Implement appropriate error handling to gracefully recover from GPU task aborts:

   ```python
   try:
       result = pipeline(text)
       return result
   except Exception as e:
       logger.error(f"Error: {str(e)}")
       return f"Error: {str(e)}"  # Return error message instead of raising
   ```

## Other Common Issues

### Multiple Models Loading Timeout

When preloading multiple large models, the application might timeout or crash due to memory constraints.

**Solution:**
- Use `torch.bfloat16` or `torch.float16` precision to reduce memory usage
- Add `trust_remote_code=True` parameter when loading models
- Use `do_sample=False` to make text generation more deterministic
- Keep token generation limits reasonable (max_new_tokens=40 or less)

### API vs Local Model Performance

When mixing API and local models, you might encounter inconsistent behavior.

**Solution:**
- Keep separate functions for API and local model execution
- Handle errors distinctly for each type
- Use non-async code for simpler execution flow

## Reporting Issues

If you encounter issues not covered in this guide, please report them by creating an issue in the repository with:
- A detailed description of the problem
- Relevant error messages
- Steps to reproduce the issue
- Your environment information (OS, Python version, GPU, etc.)