hideosnes commited on
Commit
9192e24
·
verified ·
1 Parent(s): 6cb6a24

Update app.py

Browse files

31-40

### Handle File Uploads Correctly in Gradio Spaces

This section ensures robust handling of file uploads in Hugging Face Spaces using Gradio:

- **PDF files**: When a user uploads a PDF, the file object is a `NamedBytesIO` and supports `.read()`. We use `fitz` (PyMuPDF) to extract text from each page.
- **TXT files**: When a user uploads a TXT file, the file object is a `NamedString` (acts like a string, does **not** support `.read()`). We simply convert it to a string.
- **Other file types**: These are ignored and result in an empty input.

This distinction is important because Hugging Face Spaces passes different file-like objects depending on the file type. Attempting to call `.read()` on a `NamedString` (TXT) will raise an `AttributeError`.

**Summary:**
- Use `.read()` only for PDFs.
- Use `str(file)` for TXT files.
- This approach prevents runtime errors and ensures the summarizer works for both file types.

Files changed (1) hide show
  1. app.py +6 -9
app.py CHANGED
@@ -28,19 +28,16 @@ def summarize(file, text, style, length):
28
  text_input = ""
29
  if file is not None:
30
  if file.name.endswith(".pdf"):
 
31
  with fitz.open(stream=file.read(), filetype="pdf") as doc:
32
  text_input = " ".join([page.get_text() for page in doc])
33
  elif file.name.endswith(".txt"):
 
34
  text_input = str(file)
35
- # text_input = file.read().decode("utf-8") <-- Bug
36
- # in Hugging Face Spaces, when you use gr.File, text files are passed as
37
- # a NamedString (which acts like a string with a .name attribute),
38
- # while binary files (like PDFs) are passed as a NamedBytesIO
39
- # (which has a .read() method).
40
- # ### #
41
-
42
- elif text:
43
- text_input = text
44
  # If the input text is empty or contains only whitespace,
45
  # return early with a user message and placeholder values.
46
  if not text_input.strip():
 
28
  text_input = ""
29
  if file is not None:
30
  if file.name.endswith(".pdf"):
31
+ # Only PDFs have .read()
32
  with fitz.open(stream=file.read(), filetype="pdf") as doc:
33
  text_input = " ".join([page.get_text() for page in doc])
34
  elif file.name.endswith(".txt"):
35
+ # TXT files are passed as NamedString -> use str(file)
36
  text_input = str(file)
37
+ else:
38
+ text_input = ""
39
+ elif text:
40
+ text_input = text
 
 
 
 
 
41
  # If the input text is empty or contains only whitespace,
42
  # return early with a user message and placeholder values.
43
  if not text_input.strip():