Spaces:

Mustafiz996
/

gradio_spreadsheetcomponent

Running

App Files Files Community

Mustafiz996 commited on Jun 15

Commit

cf8051d

verified ·

1 Parent(s): 044ec3a

Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

.env +1 -0
README.md +113 -113
requirements.txt +6 -6
src/README.md +2 -1
src/backend/gradio_spreadsheetcomponent/spreadsheetcomponent.py +113 -14
src/demo/.env +1 -0
src/demo/requirements.txt +1 -1
src/pyproject.toml +1 -1

.env ADDED Viewed

	@@ -0,0 +1 @@


1	+ HF_TOKEN="************"

README.md CHANGED Viewed

@@ -1,113 +1,113 @@
----
-tags: [gradio-custom-component, custom-component-track, gradio-spreadsheet-custom-component]
-title: gradio_spreadsheetcomponent
-short_description: This component answers questions about spreadsheets.
-colorFrom: blue
-colorTo: yellow
-sdk: gradio
-pinned: false
-app_file: space.py
-app_link: https://huggingface.co/spaces/Mustafiz996/gradio_spreadsheetcomponent
----
-# `gradio_spreadsheetcomponent`
-<a href="https://pypi.org/project/gradio_spreadsheetcomponent/" target="_blank"><img alt="PyPI - Version" src="https://img.shields.io/pypi/v/gradio_spreadsheetcomponent"></a>
-This component is used to answer questions about spreadsheets.
-## Installation
-```bash
-pip install gradio_spreadsheetcomponent
-```
-## Usage
-```python
-import gradio as gr
-from gradio_spreadsheetcomponent import SpreadsheetComponent
-from dotenv import load_dotenv
-import os
-import pandas as pd
-def answer_question(file, question):
-    if not file or not question:
-        return "Please upload a file and enter a question."
-    # Load the spreadsheet data
-    df = pd.read_excel(file.name)
-    # Create a SpreadsheetComponent instance
-    spreadsheet = SpreadsheetComponent(value=df)
-    # Use the component to answer the question
-    return spreadsheet.answer_question(question)
-with gr.Blocks() as demo:
-    gr.Markdown("# Spreadsheet Question Answering")
-    with gr.Row():
-        file_input = gr.File(label="Upload Spreadsheet", file_types=[".xlsx"])
-        question_input = gr.Textbox(label="Ask a Question")
-    answer_output = gr.Textbox(label="Answer", interactive=False, lines=4)
-    submit_button = gr.Button("Submit")
-    submit_button.click(answer_question, inputs=[file_input, question_input], outputs=answer_output)
-if __name__ == "__main__":
-    demo.launch()
-```
-## `SpreadsheetComponent`
-### Initialization
-<table>
-<thead>
-<tr>
-<th align="left">name</th>
-<th align="left" style="width: 25%;">type</th>
-<th align="left">default</th>
-<th align="left">description</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td align="left"><code>value</code></td>
-<td align="left" style="width: 25%;">
-```python
-pandas.core.frame.DataFrame | list | dict | None
-```
-</td>
-<td align="left"><code>None</code></td>
-<td align="left">Default value to show in spreadsheet. Can be a pandas DataFrame, list of lists, or dictionary</td>
-</tr>
-</tbody></table>
-### User function
-The impact on the users predict function varies depending on whether the component is used as an input or output for an event (or both).
-- When used as an Input, the component only impacts the input signature of the user function.
-- When used as an output, the component only impacts the return signature of the user function.
-The code snippet below is accurate in cases where the component is used as both an input and an output.
-- **As output:** Is passed, the preprocessed input data sent to the user's function in the backend.
- ```python
- def predict(
-     value: typing.Any
- ) -> Unknown:
-     return value
- ```

+---
+tags: [gradio-custom-component, custom-component-track, gradio-spreadsheet-custom-component]
+title: gradio_spreadsheetcomponent
+short_description: This component answers questions about spreadsheets.
+colorFrom: blue
+colorTo: yellow
+sdk: gradio
+pinned: false
+app_file: space.py
+app_link: https://huggingface.co/spaces/Mustafiz996/gradio_spreadsheetcomponent
+---
+# `gradio_spreadsheetcomponent`
+<a href="https://pypi.org/project/gradio_spreadsheetcomponent/" target="_blank"><img alt="PyPI - Version" src="https://img.shields.io/pypi/v/gradio_spreadsheetcomponent"></a>
+This component is used to answer questions about spreadsheets.
+## Installation
+```bash
+pip install gradio_spreadsheetcomponent
+```
+## Usage
+```python
+import gradio as gr
+from gradio_spreadsheetcomponent import SpreadsheetComponent
+from dotenv import load_dotenv
+import os
+import pandas as pd
+def answer_question(file, question):
+    if not file or not question:
+        return "Please upload a file and enter a question."
+    # Load the spreadsheet data
+    df = pd.read_excel(file.name)
+    # Create a SpreadsheetComponent instance
+    spreadsheet = SpreadsheetComponent(value=df)
+    # Use the component to answer the question
+    return spreadsheet.answer_question(question)
+with gr.Blocks() as demo:
+    gr.Markdown("# Spreadsheet Question Answering")
+    with gr.Row():
+        file_input = gr.File(label="Upload Spreadsheet", file_types=[".xlsx"])
+        question_input = gr.Textbox(label="Ask a Question")
+    answer_output = gr.Textbox(label="Answer", interactive=False, lines=4)
+    submit_button = gr.Button("Submit")
+    submit_button.click(answer_question, inputs=[file_input, question_input], outputs=answer_output)
+if __name__ == "__main__":
+    demo.launch()
+```
+## `SpreadsheetComponent`
+### Initialization
+<table>
+<thead>
+<tr>
+<th align="left">name</th>
+<th align="left" style="width: 25%;">type</th>
+<th align="left">default</th>
+<th align="left">description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="left"><code>value</code></td>
+<td align="left" style="width: 25%;">
+```python
+pandas.core.frame.DataFrame | list | dict | None
+```
+</td>
+<td align="left"><code>None</code></td>
+<td align="left">Default value to show in spreadsheet. Can be a pandas DataFrame, list of lists, or dictionary</td>
+</tr>
+</tbody></table>
+### User function
+The impact on the users predict function varies depending on whether the component is used as an input or output for an event (or both).
+- When used as an Input, the component only impacts the input signature of the user function.
+- When used as an output, the component only impacts the return signature of the user function.
+The code snippet below is accurate in cases where the component is used as both an input and an output.
+- **As output:** Is passed, the preprocessed input data sent to the user's function in the backend.
+ ```python
+ def predict(
+     value: typing.Any
+ ) -> Unknown:
+     return value
+ ```

requirements.txt CHANGED Viewed

@@ -1,7 +1,7 @@
-gradio==5.32.1
-pandas
-git+https://github.com/huggingface/huggingface_hub.git
-openpyxl
-python-dotenv
-numpy
 gradio_spreadsheetcomponent

+gradio==5.32.1
+pandas
+git+https://github.com/huggingface/huggingface_hub.git
+openpyxl
+python-dotenv
+numpy
 gradio_spreadsheetcomponent

src/README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-tags: [gradio-custom-component, SimpleTextbox, gradio-spreadsheet-custom-component]
 title: gradio_spreadsheetcomponent
 short_description: This component answers questions about spreadsheets.
 colorFrom: blue
@@ -7,6 +7,7 @@ colorTo: yellow
 sdk: gradio
 pinned: false
 app_file: space.py
 ---
 # `gradio_spreadsheetcomponent`

 ---
+tags: [gradio-custom-component, custom-component-track, gradio-spreadsheet-custom-component]
 title: gradio_spreadsheetcomponent
 short_description: This component answers questions about spreadsheets.
 colorFrom: blue
 sdk: gradio
 pinned: false
 app_file: space.py
+app_link: https://huggingface.co/spaces/Mustafiz996/gradio_spreadsheetcomponent
 ---
 # `gradio_spreadsheetcomponent`

src/backend/gradio_spreadsheetcomponent/spreadsheetcomponent.py CHANGED Viewed

@@ -45,6 +45,116 @@ class SpreadsheetComponent(FormComponent):
         self.hf_client = InferenceClient(provider="hf-inference", api_key=os.getenv("HF_TOKEN"))
     def answer_question(self, question: str) -> str:
         """Ask a question about the current spreadsheet data"""
         if self.hf_client is None:
@@ -57,6 +167,7 @@ class SpreadsheetComponent(FormComponent):
             # Convert DataFrame to table format
             table = {col: [str(val) if pd.notna(val) else "" for val in self.value[col]]
                     for col in self.value.columns}
             # Get answer using table question answering
             result = self.hf_client.table_question_answering(
                 table=table,
@@ -64,20 +175,8 @@ class SpreadsheetComponent(FormComponent):
                 model="google/tapas-large-finetuned-wtq"
             )
-            # Format the answer with more context
-            parts = []
-            parts.append(f"Answer: {result.answer}")
-            if hasattr(result, 'cells') and result.cells:
-                parts.append(f"Relevant cell values: {', '.join(result.cells)}")
-            if hasattr(result, 'coordinates') and result.coordinates:
-                parts.append("Location of relevant information:")
-                for coords in result.coordinates:
-                    row, col = coords
-                    parts.append(f"- Row {row}, Column '{col}'")
-            return "\n".join(parts)
         except Exception as e:
             return f"Error processing question: {str(e)}\nPlease try rephrasing your question or verify the data format."

         self.hf_client = InferenceClient(provider="hf-inference", api_key=os.getenv("HF_TOKEN"))
+    def postprocess_answer(self, result) -> str:
+        """Process and verify the model's answer, especially for aggregation operations."""
+        try:
+            # Extract answer and check if it's a number (potential aggregation)
+            answer = getattr(result, 'answer', None)
+            if not answer or str(answer).lower() in ['none', 'null', 'nan', '']:
+                return "No answer found"
+            # Detect aggregation keywords in the answer
+            agg_keywords = {
+                'sum': 'sum',
+                'average': 'mean',
+                'mean': 'mean',
+                'maximum': 'max',
+                'max': 'max',
+                'minimum': 'min',
+                'min': 'min',
+                'count': 'count'
+            }
+            # Check if we need to verify any aggregation
+            operation = None
+            for fun_name in agg_keywords.keys():
+                if fun_name in str(result.aggregator.lower()):
+                    operation = fun_name
+                    break
+            coordinates = getattr(result, 'coordinates', None)
+            if operation and coordinates and len(coordinates) > 0:
+                col_name = None
+                try:
+                    # Group coordinates by column to ensure we're working with consistent data
+                    col_groups = {}
+                    for row_idx, col_idx in coordinates:
+                        if col_name is None:
+                            col_name = self.value.columns[col_idx]
+                        elif col_name != self.value.columns[col_idx]:
+                            continue  # Skip if value is from a different column
+                        value = self.value.iloc[row_idx, col_idx]
+                        if pd.notna(value):  # Only include non-NA values
+                            col_groups.setdefault(col_name, []).append(value)
+                    if col_name and col_groups:
+                        # Convert collected values to numeric, handling non-numeric values
+                        numeric_values = pd.to_numeric(col_groups[col_name], errors='coerce')
+                        if len(numeric_values) > 0:
+                            # Perform the aggregation on the specific values
+                            if operation == 'sum':
+                                computed_value = numeric_values.sum()
+                            elif operation in ['mean', 'average']:
+                                computed_value = numeric_values.mean()
+                            elif operation in ['max', 'maximum']:
+                                computed_value = numeric_values.max()
+                            elif operation in ['min', 'minimum']:
+                                computed_value = numeric_values.min()
+                            elif operation == 'count':
+                                computed_value = len(numeric_values)
+                            else:
+                                computed_value = None
+                            # Format the computed value
+                            if pd.notna(computed_value):
+                                # Round floating point numbers to 2 decimal places
+                                if isinstance(computed_value, float):
+                                    computed_value = round(computed_value, 2)
+                                # Add verification to the answer
+                                parts = []
+                                parts.append(f"Answer: {computed_value}")
+                                # Add information about the cells used
+                                cells = getattr(result, 'cells', None)
+                                if cells:
+                                    parts.append(f"Values used: {', '.join(str(x) for x in cells)}")
+                                parts.append(f"Column used: '{col_name}'")
+                                parts.append(f"Number of values considered: {len(numeric_values)}")
+                                return "\n".join(parts)
+                except Exception as calc_error:
+                    # If calculation fails, return original answer with error info
+                    parts = []
+                    parts.append(f"Answer: {answer}")
+                    parts.append(f"Note: Could not verify {operation} calculation: {str(calc_error)}")
+                    return "\n".join(parts)
+            # If no aggregation needed or verification failed, return the original formatted answer
+            parts = []
+            parts.append(f"Answer: {answer}")
+            cells = getattr(result, 'cells', None)
+            if cells:
+                parts.append(f"Relevant cell values: {', '.join(str(x) for x in cells)}")
+            coordinates = getattr(result, 'coordinates', None)
+            if coordinates:
+                parts.append("Location of relevant information:")
+                for coords in coordinates:
+                    row_idx, col_idx = coords
+                    col_name = self.value.columns[col_idx]
+                    parts.append(f"- Row {row_idx}, Column '{col_name}'")
+            return "\n".join(parts)
+        except Exception as e:
+            return f"Error processing answer: {str(e)}"
     def answer_question(self, question: str) -> str:
         """Ask a question about the current spreadsheet data"""
         if self.hf_client is None:
             # Convert DataFrame to table format
             table = {col: [str(val) if pd.notna(val) else "" for val in self.value[col]]
                     for col in self.value.columns}
             # Get answer using table question answering
             result = self.hf_client.table_question_answering(
                 table=table,
                 model="google/tapas-large-finetuned-wtq"
             )
+            # Use postprocess_answer to handle the result
+            return self.postprocess_answer(result)
         except Exception as e:
             return f"Error processing question: {str(e)}\nPlease try rephrasing your question or verify the data format."

src/demo/.env ADDED Viewed

	@@ -0,0 +1 @@


1	+ HF_TOKEN="************"

src/demo/requirements.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 gradio==5.32.1
 pandas
-huggingface-hub
 openpyxl
 python-dotenv
 numpy

 gradio==5.32.1
 pandas
+git+https://github.com/huggingface/huggingface_hub.git
 openpyxl
 python-dotenv
 numpy

src/pyproject.toml CHANGED Viewed

@@ -8,7 +8,7 @@ build-backend = "hatchling.build"
 [project]
 name = "gradio_spreadsheetcomponent"
-version = "0.0.2"
 description = "This component is used to answer questions about spreadsheets."
 readme = "README.md"
 license = "apache-2.0"

 [project]
 name = "gradio_spreadsheetcomponent"
+version = "0.0.3"
 description = "This component is used to answer questions about spreadsheets."
 readme = "README.md"
 license = "apache-2.0"