Spaces:
Build error
Build error
Ryan
commited on
Commit
·
6cebf06
1
Parent(s):
5925dce
update
Browse files
.DS_Store
CHANGED
|
Binary files a/.DS_Store and b/.DS_Store differ
|
|
|
README.md
CHANGED
|
@@ -57,6 +57,159 @@ The summary tab provides a summary of two of the prompts: the Trump and Harris p
|
|
| 57 |
|
| 58 |
# Documentation
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
|
| 62 |
# Contributions
|
|
|
|
| 57 |
|
| 58 |
# Documentation
|
| 59 |
|
| 60 |
+
## Datasets
|
| 61 |
+
|
| 62 |
+
Built-in Dataset Structure
|
| 63 |
+
|
| 64 |
+
The application includes several pre-built datasets for analysis:
|
| 65 |
+
|
| 66 |
+
Format: Simple text files with structured format:
|
| 67 |
+
\prompt= [prompt text]
|
| 68 |
+
\response1= [first model response]
|
| 69 |
+
\model1= [first model name]
|
| 70 |
+
\response2= [second model response]
|
| 71 |
+
\model2= [second model name]
|
| 72 |
+
|
| 73 |
+
Included Datasets:
|
| 74 |
+
|
| 75 |
+
Political Figures Responses: Comparisons of how different LLMs discuss political figures
|
| 76 |
+
|
| 77 |
+
- person-harris.txt: Responses about Kamala Harris
|
| 78 |
+
- person-trump.txt: Responses about Donald Trump
|
| 79 |
+
|
| 80 |
+
Political Topics Responses: Comparisons on general political topics
|
| 81 |
+
|
| 82 |
+
- topic-foreign_policy.txt: Responses about foreign policy views
|
| 83 |
+
- topic-the_economy.txt: Responses about economic views
|
| 84 |
+
|
| 85 |
+
Dataset Collection Process:
|
| 86 |
+
|
| 87 |
+
- Prompts were designed to elicit substantive responses on political topics
|
| 88 |
+
- Identical prompts were submitted to different commercial LLMs
|
| 89 |
+
- Responses were collected verbatim without modification
|
| 90 |
+
- Model identifiers were preserved for attribution
|
| 91 |
+
- Responses were formatted into the standardized text format
|
| 92 |
+
|
| 93 |
+
Dataset Size and Characteristics:
|
| 94 |
+
|
| 95 |
+
- Each dataset contains one prompt and two model responses
|
| 96 |
+
- Response length ranges from approximately 300-600 words
|
| 97 |
+
- Models represented include ExaOne3.5, Granite3.2, and others
|
| 98 |
+
- Topics were selected to span typical political discussion areas
|
| 99 |
+
|
| 100 |
+
## Frameworks
|
| 101 |
+
|
| 102 |
+
- Gradio is the main framework used to build the app. It provides a simple interface for creating web applications with Python.
|
| 103 |
+
- Matplotlib is used for some basic plotting in the visuals tab.
|
| 104 |
+
- NLTK is used mainly for the VADER sentiment analysis classifier.
|
| 105 |
+
- This is for both the basic classifier and bias detection.
|
| 106 |
+
- Hugging Face Transformers is used for the RoBERTa transformer model.
|
| 107 |
+
- Scikit-learn is used for the Bag of Words and N-grams analysis.
|
| 108 |
+
- Pandas is used for data manipulation and analysis.
|
| 109 |
+
- NumPy is used for numerical computations.
|
| 110 |
+
- JSON and os are used for file handling in relation to the datasets.
|
| 111 |
+
- re, Regular Expressions, is used for text processing and cleaning.
|
| 112 |
+
|
| 113 |
+
## App Flow
|
| 114 |
+
|
| 115 |
+
We start with the dataset input. This can be a user entered dataset or a built-in dataset. We then go to the analysis tab which has four options. After that is a RoBERTa classifier, which is a transformer model compared to a non-transformer classifier used in the analysis tab. We have a summary after that, followed by some basic visual plots.
|
| 116 |
+
|
| 117 |
+
## Bag of Words
|
| 118 |
+
|
| 119 |
+
Basic preprocessing is done to the text data, including:
|
| 120 |
+
- Lowercasing
|
| 121 |
+
- Removing punctuation
|
| 122 |
+
- Removing stop words
|
| 123 |
+
- Tokenization
|
| 124 |
+
- Lemmatization
|
| 125 |
+
- Removing special characters
|
| 126 |
+
|
| 127 |
+
Here is an example of the results from the Harris text file:
|
| 128 |
+
|
| 129 |
+
Top Words Used by ExaOne3.5
|
| 130 |
+
|
| 131 |
+
harris (8), policy (8), justice (5), attorney (4), issue (4), measure (4), political (4), aimed (3), approach (3), general (3)
|
| 132 |
+
|
| 133 |
+
Top Words Used by Granite3.2
|
| 134 |
+
|
| 135 |
+
harris (7), support (6), view (6), issue (5), right (5), policy (4), party (3), political (3), president (3), progressive (3)
|
| 136 |
+
|
| 137 |
+
Similarity Metrics
|
| 138 |
+
|
| 139 |
+
- Cosine Similarity: 0.67 (higher means more similar word frequency patterns)
|
| 140 |
+
- Jaccard Similarity: 0.22 (higher means more word overlap)
|
| 141 |
+
- Semantic Similarity: 0.53 (higher means more similar meaning)
|
| 142 |
+
- Common Words: 71 words appear in both responses
|
| 143 |
+
|
| 144 |
+
The main concepts here of comparison are the top words used by each model, the similarity metrics, and the common words. The top words are the most frequently used words in each response. The similarity metrics are calculated using cosine similarity, Jaccard similarity, and semantic similarity. The common words are the words that appear in both responses.
|
| 145 |
+
|
| 146 |
+
## N-grams
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
## The Classifiers
|
| 151 |
+
|
| 152 |
+
There is a RoBERTa transformer based classifier and one that uses NLTK VADER sentiment analysis. The RoBERTa classifier is a transformer model that is trained on a large corpus of text data and is designed to understand the context and meaning of words in a sentence. The NLTK VADER sentiment analysis classifier is a rule-based model that uses a lexicon of words and their associated sentiment scores to determine the sentiment of a sentence. Both classifiers are used to analyze the sentiment of the responses from the LLMs. The VADER one is simpler and faster, while the RoBERTa one is more complex and takes longer to run. The RoBERTa classifier is also more accurate than the VADER classifier, but it requires more computational resources to run.
|
| 153 |
+
|
| 154 |
+
### RoBERTa
|
| 155 |
+
|
| 156 |
+
Architecture: RoBERTa (Robustly Optimized BERT Pretraining Approach) is a transformer-based language model that improves upon BERT through modifications to the pretraining process.
|
| 157 |
+
|
| 158 |
+
Training Procedure:
|
| 159 |
+
|
| 160 |
+
- Trained on a massive dataset of 160GB of text
|
| 161 |
+
- Uses dynamic masking pattern for masked language modeling
|
| 162 |
+
- Trained with larger batches and learning rates than BERT
|
| 163 |
+
- Eliminates BERT's next-sentence prediction objective
|
| 164 |
+
|
| 165 |
+
Implementation Details:
|
| 166 |
+
|
| 167 |
+
- Uses the transformers library from Hugging Face
|
| 168 |
+
- Specifically uses RobertaForSequenceClassification for sentiment analysis
|
| 169 |
+
- Model loaded: roberta-large-mnli for natural language inference tasks
|
| 170 |
+
|
| 171 |
+
Compute Requirements:
|
| 172 |
+
|
| 173 |
+
- Inference requires moderate GPU resources or CPU with sufficient memory
|
| 174 |
+
- Model size: ~355M parameters
|
| 175 |
+
- Typical memory usage: ~1.3GB when loaded
|
| 176 |
+
|
| 177 |
+
Training Data:
|
| 178 |
+
|
| 179 |
+
- BookCorpus (800M words)
|
| 180 |
+
- English Wikipedia (2,500M words)
|
| 181 |
+
- CC-News (63M articles, 76GB)
|
| 182 |
+
- OpenWebText (38GB)
|
| 183 |
+
- Stories (31GB)
|
| 184 |
+
|
| 185 |
+
Known Limitations:
|
| 186 |
+
|
| 187 |
+
- May struggle with highly domain-specific language
|
| 188 |
+
- Limited context window (512 tokens)
|
| 189 |
+
- Performance can degrade on very short texts
|
| 190 |
+
- Has potential biases from training data
|
| 191 |
+
|
| 192 |
+
### NLTK VADER
|
| 193 |
+
|
| 194 |
+
Components Used:
|
| 195 |
+
|
| 196 |
+
- NLTK's SentimentIntensityAnalyzer (VADER lexicon-based model)
|
| 197 |
+
- WordNet Lemmatizer
|
| 198 |
+
- Tokenizers (word, sentence)
|
| 199 |
+
- Stopword filters
|
| 200 |
+
|
| 201 |
+
Training Data:
|
| 202 |
+
|
| 203 |
+
- VADER sentiment analyzer was trained on social media content, movie reviews, and product reviews
|
| 204 |
+
- NLTK word tokenizers trained on standard English corpora
|
| 205 |
+
|
| 206 |
+
Limitations:
|
| 207 |
+
|
| 208 |
+
- Rule-based classifiers have lower accuracy than deep learning models
|
| 209 |
+
- Limited ability to understand context and nuance
|
| 210 |
+
- VADER sentiment analyzer works best on short social media-like texts
|
| 211 |
+
|
| 212 |
+
## Bias Detection
|
| 213 |
|
| 214 |
|
| 215 |
# Contributions
|
app.py
CHANGED
|
@@ -67,6 +67,9 @@ def create_app():
|
|
| 67 |
analysis_results_state = gr.State({})
|
| 68 |
roberta_results_state = gr.State({})
|
| 69 |
|
|
|
|
|
|
|
|
|
|
| 70 |
# Dataset Input Tab
|
| 71 |
with gr.Tab("Dataset Input"):
|
| 72 |
# Filter out files that start with 'summary' for the Dataset Input tab
|
|
@@ -131,11 +134,12 @@ def create_app():
|
|
| 131 |
status_message = gr.Markdown(visible=False)
|
| 132 |
|
| 133 |
# Define a helper function to extract parameter values and run the analysis
|
| 134 |
-
def run_analysis(dataset, selected_analysis, ngram_n, topic_count):
|
| 135 |
try:
|
| 136 |
if not dataset or "entries" not in dataset or not dataset["entries"]:
|
| 137 |
return (
|
| 138 |
{}, # analysis_results_state
|
|
|
|
| 139 |
False, # analysis_output visibility
|
| 140 |
False, # visualization_area_visible
|
| 141 |
gr.update(visible=False), # analysis_title
|
|
@@ -164,10 +168,44 @@ def create_app():
|
|
| 164 |
# Process the analysis request - passing selected_analysis as a string
|
| 165 |
analysis_results, _ = process_analysis_request(dataset, selected_analysis, parameters)
|
| 166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
# If there's an error or no results
|
| 168 |
if not analysis_results or "analyses" not in analysis_results or not analysis_results["analyses"]:
|
| 169 |
return (
|
| 170 |
analysis_results,
|
|
|
|
| 171 |
False,
|
| 172 |
False,
|
| 173 |
gr.update(visible=False),
|
|
@@ -212,6 +250,7 @@ def create_app():
|
|
| 212 |
if "message" in analyses:
|
| 213 |
return (
|
| 214 |
analysis_results,
|
|
|
|
| 215 |
False,
|
| 216 |
False,
|
| 217 |
gr.update(visible=False),
|
|
@@ -535,6 +574,7 @@ def create_app():
|
|
| 535 |
if not visualization_area_visible:
|
| 536 |
return (
|
| 537 |
analysis_results,
|
|
|
|
| 538 |
False,
|
| 539 |
False,
|
| 540 |
gr.update(visible=False),
|
|
@@ -545,6 +585,7 @@ def create_app():
|
|
| 545 |
gr.update(visible=False),
|
| 546 |
gr.update(visible=False),
|
| 547 |
gr.update(visible=False),
|
|
|
|
| 548 |
True, # status_message_visible
|
| 549 |
gr.update(visible=True, value="❌ **No visualization data found.** Make sure to select a valid analysis option.")
|
| 550 |
)
|
|
@@ -552,6 +593,7 @@ def create_app():
|
|
| 552 |
# Return all updated component values
|
| 553 |
return (
|
| 554 |
analysis_results, # analysis_results_state
|
|
|
|
| 555 |
False, # analysis_output visibility
|
| 556 |
True, # visualization_area_visible
|
| 557 |
gr.update(visible=True), # analysis_title
|
|
@@ -574,6 +616,7 @@ def create_app():
|
|
| 574 |
|
| 575 |
return (
|
| 576 |
{"error": error_msg}, # analysis_results_state
|
|
|
|
| 577 |
True, # analysis_output visibility (show raw JSON for debugging)
|
| 578 |
False, # visualization_area_visible
|
| 579 |
gr.update(visible=False),
|
|
@@ -601,12 +644,13 @@ def create_app():
|
|
| 601 |
roberta_viz_content = gr.HTML("", visible=False)
|
| 602 |
|
| 603 |
# Function to run RoBERTa sentiment analysis (FIXED)
|
| 604 |
-
def run_roberta_analysis(dataset):
|
| 605 |
try:
|
| 606 |
print("Starting run_roberta_analysis function")
|
| 607 |
if not dataset or "entries" not in dataset or not dataset["entries"]:
|
| 608 |
return (
|
| 609 |
{}, # roberta_results_state
|
|
|
|
| 610 |
gr.update(visible=True, value="❌ **Error:** No dataset loaded. Please create or load a dataset first."), # roberta_status
|
| 611 |
gr.update(visible=False), # roberta_output
|
| 612 |
gr.update(visible=False), # roberta_viz_title
|
|
@@ -620,10 +664,32 @@ def create_app():
|
|
| 620 |
|
| 621 |
print(f"RoBERTa results obtained. Size: {len(str(roberta_results))} characters")
|
| 622 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 623 |
# Check if we have results
|
| 624 |
if "error" in roberta_results:
|
| 625 |
return (
|
| 626 |
roberta_results, # Store in state anyway for debugging
|
|
|
|
| 627 |
gr.update(visible=True, value=f"❌ **Error:** {roberta_results['error']}"), # roberta_status
|
| 628 |
gr.update(visible=False), # Hide raw output
|
| 629 |
gr.update(visible=False), # roberta_viz_title
|
|
@@ -674,6 +740,7 @@ def create_app():
|
|
| 674 |
# Return updated values
|
| 675 |
return (
|
| 676 |
roberta_results, # roberta_results_state
|
|
|
|
| 677 |
gr.update(visible=False), # roberta_status (hide status message)
|
| 678 |
gr.update(visible=False), # roberta_output (hide raw output)
|
| 679 |
gr.update(visible=True), # roberta_viz_title (show title)
|
|
@@ -687,6 +754,7 @@ def create_app():
|
|
| 687 |
|
| 688 |
return (
|
| 689 |
{"error": error_msg}, # roberta_results_state
|
|
|
|
| 690 |
gr.update(visible=True, value=f"❌ **Error during RoBERTa analysis:**\n\n```\n{str(e)}\n```"), # roberta_status
|
| 691 |
gr.update(visible=False), # Hide raw output
|
| 692 |
gr.update(visible=False), # roberta_viz_title
|
|
@@ -696,9 +764,10 @@ def create_app():
|
|
| 696 |
# Connect the run button to the analysis function (FIXED)
|
| 697 |
run_roberta_btn.click(
|
| 698 |
fn=run_roberta_analysis,
|
| 699 |
-
inputs=[dataset_state],
|
| 700 |
outputs=[
|
| 701 |
roberta_results_state,
|
|
|
|
| 702 |
roberta_status,
|
| 703 |
roberta_output,
|
| 704 |
roberta_viz_title,
|
|
@@ -715,11 +784,12 @@ def create_app():
|
|
| 715 |
# Get summary files from dataset directory
|
| 716 |
summary_files = [f for f in os.listdir("dataset") if f.startswith("summary-") and f.endswith(".txt")]
|
| 717 |
|
|
|
|
| 718 |
summary_dropdown = gr.Dropdown(
|
| 719 |
-
choices=summary_files,
|
| 720 |
label="Select Summary",
|
| 721 |
info="Choose a summary to display",
|
| 722 |
-
value=
|
| 723 |
)
|
| 724 |
|
| 725 |
load_summary_btn = gr.Button("Load Summary", variant="primary")
|
|
@@ -734,11 +804,173 @@ def create_app():
|
|
| 734 |
|
| 735 |
summary_status = gr.Markdown("*No summary loaded*")
|
| 736 |
|
| 737 |
-
# Function to load summary content from file
|
| 738 |
-
def
|
| 739 |
if not file_name:
|
| 740 |
return "", "*No summary selected*"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 741 |
|
|
|
|
| 742 |
file_path = os.path.join("dataset", file_name)
|
| 743 |
if os.path.exists(file_path):
|
| 744 |
try:
|
|
@@ -749,18 +981,24 @@ def create_app():
|
|
| 749 |
return "", f"❌ **Error loading summary**: {str(e)}"
|
| 750 |
else:
|
| 751 |
return "", f"❌ **File not found**: {file_path}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 752 |
|
| 753 |
# Connect the load button to the function
|
| 754 |
load_summary_btn.click(
|
| 755 |
-
fn=
|
| 756 |
-
inputs=[summary_dropdown],
|
| 757 |
outputs=[summary_content, summary_status]
|
| 758 |
)
|
| 759 |
|
| 760 |
# Also load summary when dropdown changes
|
| 761 |
summary_dropdown.change(
|
| 762 |
-
fn=
|
| 763 |
-
inputs=[summary_dropdown],
|
| 764 |
outputs=[summary_content, summary_status]
|
| 765 |
)
|
| 766 |
# Add a Visuals tab for plotting graphs
|
|
@@ -946,9 +1184,10 @@ def create_app():
|
|
| 946 |
# Run analysis with proper parameters
|
| 947 |
run_analysis_btn.click(
|
| 948 |
fn=run_analysis,
|
| 949 |
-
inputs=[dataset_state, analysis_options, ngram_n, topic_count],
|
| 950 |
outputs=[
|
| 951 |
analysis_results_state,
|
|
|
|
| 952 |
analysis_output,
|
| 953 |
visualization_area_visible,
|
| 954 |
analysis_title,
|
|
@@ -965,6 +1204,15 @@ def create_app():
|
|
| 965 |
]
|
| 966 |
)
|
| 967 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 968 |
return app
|
| 969 |
|
| 970 |
if __name__ == "__main__":
|
|
|
|
| 67 |
analysis_results_state = gr.State({})
|
| 68 |
roberta_results_state = gr.State({})
|
| 69 |
|
| 70 |
+
# NEW: Add a state for storing user dataset analysis results
|
| 71 |
+
user_analysis_log = gr.State({})
|
| 72 |
+
|
| 73 |
# Dataset Input Tab
|
| 74 |
with gr.Tab("Dataset Input"):
|
| 75 |
# Filter out files that start with 'summary' for the Dataset Input tab
|
|
|
|
| 134 |
status_message = gr.Markdown(visible=False)
|
| 135 |
|
| 136 |
# Define a helper function to extract parameter values and run the analysis
|
| 137 |
+
def run_analysis(dataset, selected_analysis, ngram_n, topic_count, existing_log):
|
| 138 |
try:
|
| 139 |
if not dataset or "entries" not in dataset or not dataset["entries"]:
|
| 140 |
return (
|
| 141 |
{}, # analysis_results_state
|
| 142 |
+
existing_log, # no changes to user_analysis_log
|
| 143 |
False, # analysis_output visibility
|
| 144 |
False, # visualization_area_visible
|
| 145 |
gr.update(visible=False), # analysis_title
|
|
|
|
| 168 |
# Process the analysis request - passing selected_analysis as a string
|
| 169 |
analysis_results, _ = process_analysis_request(dataset, selected_analysis, parameters)
|
| 170 |
|
| 171 |
+
# NEW: Store the results in the user_analysis_log
|
| 172 |
+
updated_log = existing_log.copy() if existing_log else {}
|
| 173 |
+
|
| 174 |
+
# Get the prompt text for identifying this analysis
|
| 175 |
+
prompt_text = None
|
| 176 |
+
if analysis_results and "analyses" in analysis_results:
|
| 177 |
+
prompt_text = list(analysis_results["analyses"].keys())[0] if analysis_results["analyses"] else None
|
| 178 |
+
|
| 179 |
+
if prompt_text:
|
| 180 |
+
# Initialize this prompt in the log if it doesn't exist
|
| 181 |
+
if prompt_text not in updated_log:
|
| 182 |
+
updated_log[prompt_text] = {}
|
| 183 |
+
|
| 184 |
+
# Store the results for this analysis type
|
| 185 |
+
if selected_analysis in ["Bag of Words", "N-gram Analysis", "Bias Detection", "Classifier"]:
|
| 186 |
+
# Only store if the analysis was actually performed and has results
|
| 187 |
+
analyses = analysis_results["analyses"][prompt_text]
|
| 188 |
+
|
| 189 |
+
# Map the selected analysis to its key in the analyses dict
|
| 190 |
+
analysis_key_map = {
|
| 191 |
+
"Bag of Words": "bag_of_words",
|
| 192 |
+
"N-gram Analysis": "ngram_analysis",
|
| 193 |
+
"Bias Detection": "bias_detection",
|
| 194 |
+
"Classifier": "classifier"
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
if analysis_key_map[selected_analysis] in analyses:
|
| 198 |
+
# Store the specific analysis result
|
| 199 |
+
updated_log[prompt_text][selected_analysis] = {
|
| 200 |
+
"timestamp": gr.utils.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
| 201 |
+
"result": analyses[analysis_key_map[selected_analysis]]
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
# If there's an error or no results
|
| 205 |
if not analysis_results or "analyses" not in analysis_results or not analysis_results["analyses"]:
|
| 206 |
return (
|
| 207 |
analysis_results,
|
| 208 |
+
updated_log, # Return the updated log
|
| 209 |
False,
|
| 210 |
False,
|
| 211 |
gr.update(visible=False),
|
|
|
|
| 250 |
if "message" in analyses:
|
| 251 |
return (
|
| 252 |
analysis_results,
|
| 253 |
+
updated_log, # Return the updated log
|
| 254 |
False,
|
| 255 |
False,
|
| 256 |
gr.update(visible=False),
|
|
|
|
| 574 |
if not visualization_area_visible:
|
| 575 |
return (
|
| 576 |
analysis_results,
|
| 577 |
+
updated_log, # Return the updated log
|
| 578 |
False,
|
| 579 |
False,
|
| 580 |
gr.update(visible=False),
|
|
|
|
| 585 |
gr.update(visible=False),
|
| 586 |
gr.update(visible=False),
|
| 587 |
gr.update(visible=False),
|
| 588 |
+
gr.update(visible=False),
|
| 589 |
True, # status_message_visible
|
| 590 |
gr.update(visible=True, value="❌ **No visualization data found.** Make sure to select a valid analysis option.")
|
| 591 |
)
|
|
|
|
| 593 |
# Return all updated component values
|
| 594 |
return (
|
| 595 |
analysis_results, # analysis_results_state
|
| 596 |
+
updated_log, # Return the updated log
|
| 597 |
False, # analysis_output visibility
|
| 598 |
True, # visualization_area_visible
|
| 599 |
gr.update(visible=True), # analysis_title
|
|
|
|
| 616 |
|
| 617 |
return (
|
| 618 |
{"error": error_msg}, # analysis_results_state
|
| 619 |
+
existing_log, # Return unchanged log
|
| 620 |
True, # analysis_output visibility (show raw JSON for debugging)
|
| 621 |
False, # visualization_area_visible
|
| 622 |
gr.update(visible=False),
|
|
|
|
| 644 |
roberta_viz_content = gr.HTML("", visible=False)
|
| 645 |
|
| 646 |
# Function to run RoBERTa sentiment analysis (FIXED)
|
| 647 |
+
def run_roberta_analysis(dataset, existing_log):
|
| 648 |
try:
|
| 649 |
print("Starting run_roberta_analysis function")
|
| 650 |
if not dataset or "entries" not in dataset or not dataset["entries"]:
|
| 651 |
return (
|
| 652 |
{}, # roberta_results_state
|
| 653 |
+
existing_log, # no change to user_analysis_log
|
| 654 |
gr.update(visible=True, value="❌ **Error:** No dataset loaded. Please create or load a dataset first."), # roberta_status
|
| 655 |
gr.update(visible=False), # roberta_output
|
| 656 |
gr.update(visible=False), # roberta_viz_title
|
|
|
|
| 664 |
|
| 665 |
print(f"RoBERTa results obtained. Size: {len(str(roberta_results))} characters")
|
| 666 |
|
| 667 |
+
# NEW: Update the user analysis log with RoBERTa results
|
| 668 |
+
updated_log = existing_log.copy() if existing_log else {}
|
| 669 |
+
|
| 670 |
+
# Get the prompt text
|
| 671 |
+
prompt_text = None
|
| 672 |
+
if "analyses" in roberta_results:
|
| 673 |
+
prompt_text = list(roberta_results["analyses"].keys())[0] if roberta_results["analyses"] else None
|
| 674 |
+
|
| 675 |
+
if prompt_text:
|
| 676 |
+
# Initialize this prompt in the log if it doesn't exist
|
| 677 |
+
if prompt_text not in updated_log:
|
| 678 |
+
updated_log[prompt_text] = {}
|
| 679 |
+
|
| 680 |
+
# Store the RoBERTa results
|
| 681 |
+
if "analyses" in roberta_results and prompt_text in roberta_results["analyses"]:
|
| 682 |
+
if "roberta_sentiment" in roberta_results["analyses"][prompt_text]:
|
| 683 |
+
updated_log[prompt_text]["RoBERTa Sentiment"] = {
|
| 684 |
+
"timestamp": gr.utils.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
| 685 |
+
"result": roberta_results["analyses"][prompt_text]["roberta_sentiment"]
|
| 686 |
+
}
|
| 687 |
+
|
| 688 |
# Check if we have results
|
| 689 |
if "error" in roberta_results:
|
| 690 |
return (
|
| 691 |
roberta_results, # Store in state anyway for debugging
|
| 692 |
+
updated_log, # Return updated log
|
| 693 |
gr.update(visible=True, value=f"❌ **Error:** {roberta_results['error']}"), # roberta_status
|
| 694 |
gr.update(visible=False), # Hide raw output
|
| 695 |
gr.update(visible=False), # roberta_viz_title
|
|
|
|
| 740 |
# Return updated values
|
| 741 |
return (
|
| 742 |
roberta_results, # roberta_results_state
|
| 743 |
+
updated_log, # Return updated log
|
| 744 |
gr.update(visible=False), # roberta_status (hide status message)
|
| 745 |
gr.update(visible=False), # roberta_output (hide raw output)
|
| 746 |
gr.update(visible=True), # roberta_viz_title (show title)
|
|
|
|
| 754 |
|
| 755 |
return (
|
| 756 |
{"error": error_msg}, # roberta_results_state
|
| 757 |
+
existing_log, # Return unchanged log
|
| 758 |
gr.update(visible=True, value=f"❌ **Error during RoBERTa analysis:**\n\n```\n{str(e)}\n```"), # roberta_status
|
| 759 |
gr.update(visible=False), # Hide raw output
|
| 760 |
gr.update(visible=False), # roberta_viz_title
|
|
|
|
| 764 |
# Connect the run button to the analysis function (FIXED)
|
| 765 |
run_roberta_btn.click(
|
| 766 |
fn=run_roberta_analysis,
|
| 767 |
+
inputs=[dataset_state, user_analysis_log],
|
| 768 |
outputs=[
|
| 769 |
roberta_results_state,
|
| 770 |
+
user_analysis_log,
|
| 771 |
roberta_status,
|
| 772 |
roberta_output,
|
| 773 |
roberta_viz_title,
|
|
|
|
| 784 |
# Get summary files from dataset directory
|
| 785 |
summary_files = [f for f in os.listdir("dataset") if f.startswith("summary-") and f.endswith(".txt")]
|
| 786 |
|
| 787 |
+
# Add "YOUR DATASET RESULTS" to dropdown choices if we have user analysis
|
| 788 |
summary_dropdown = gr.Dropdown(
|
| 789 |
+
choices=["YOUR DATASET RESULTS"] + summary_files,
|
| 790 |
label="Select Summary",
|
| 791 |
info="Choose a summary to display",
|
| 792 |
+
value="YOUR DATASET RESULTS"
|
| 793 |
)
|
| 794 |
|
| 795 |
load_summary_btn = gr.Button("Load Summary", variant="primary")
|
|
|
|
| 804 |
|
| 805 |
summary_status = gr.Markdown("*No summary loaded*")
|
| 806 |
|
| 807 |
+
# Function to load summary content from file or user analysis
|
| 808 |
+
def load_summary_content(file_name, user_log):
|
| 809 |
if not file_name:
|
| 810 |
return "", "*No summary selected*"
|
| 811 |
+
|
| 812 |
+
# Handle the special "YOUR DATASET RESULTS" option
|
| 813 |
+
if file_name == "YOUR DATASET RESULTS":
|
| 814 |
+
if not user_log or not any(user_log.values()):
|
| 815 |
+
return "", "❌ **No analysis results available.** Run some analyses in the Analysis tab first."
|
| 816 |
+
|
| 817 |
+
# Format the user analysis log as text
|
| 818 |
+
content = "# YOUR DATASET ANALYSIS RESULTS\n\n"
|
| 819 |
+
|
| 820 |
+
for prompt, analyses in user_log.items():
|
| 821 |
+
content += f"## Analysis of Prompt: \"{prompt[:100]}{'...' if len(prompt) > 100 else ''}\"\n\n"
|
| 822 |
+
|
| 823 |
+
if not analyses:
|
| 824 |
+
content += "_No analyses run for this prompt._\n\n"
|
| 825 |
+
continue
|
| 826 |
+
|
| 827 |
+
# Order the analyses in a specific sequence
|
| 828 |
+
analysis_order = ["Bag of Words", "N-gram Analysis", "Classifier", "Bias Detection", "RoBERTa Sentiment"]
|
| 829 |
+
|
| 830 |
+
for analysis_type in analysis_order:
|
| 831 |
+
if analysis_type in analyses:
|
| 832 |
+
analysis_data = analyses[analysis_type]
|
| 833 |
+
timestamp = analysis_data.get("timestamp", "")
|
| 834 |
+
result = analysis_data.get("result", {})
|
| 835 |
+
|
| 836 |
+
content += f"### {analysis_type} ({timestamp})\n\n"
|
| 837 |
+
|
| 838 |
+
# Format based on analysis type
|
| 839 |
+
if analysis_type == "Bag of Words":
|
| 840 |
+
models = result.get("models", [])
|
| 841 |
+
if len(models) >= 2:
|
| 842 |
+
content += f"Comparing responses from {models[0]} and {models[1]}\n\n"
|
| 843 |
+
|
| 844 |
+
# Add important words for each model
|
| 845 |
+
important_words = result.get("important_words", {})
|
| 846 |
+
for model_name in models:
|
| 847 |
+
if model_name in important_words:
|
| 848 |
+
content += f"Top Words Used by {model_name}\n"
|
| 849 |
+
word_list = [f"{item['word']} ({item['count']})" for item in important_words[model_name][:10]]
|
| 850 |
+
content += ", ".join(word_list) + "\n\n"
|
| 851 |
+
|
| 852 |
+
# Add similarity metrics
|
| 853 |
+
comparisons = result.get("comparisons", {})
|
| 854 |
+
comparison_key = f"{models[0]} vs {models[1]}"
|
| 855 |
+
if comparison_key in comparisons:
|
| 856 |
+
metrics = comparisons[comparison_key]
|
| 857 |
+
content += "Similarity Metrics\n"
|
| 858 |
+
content += f"Cosine Similarity: {metrics.get('cosine_similarity', 0):.2f} (higher means more similar word frequency patterns)\n"
|
| 859 |
+
content += f"Jaccard Similarity: {metrics.get('jaccard_similarity', 0):.2f} (higher means more word overlap)\n"
|
| 860 |
+
content += f"Semantic Similarity: {metrics.get('semantic_similarity', 0):.2f} (higher means more similar meaning)\n"
|
| 861 |
+
content += f"Common Words: {metrics.get('common_word_count', 0)} words appear in both responses\n\n"
|
| 862 |
+
|
| 863 |
+
elif analysis_type == "N-gram Analysis":
|
| 864 |
+
models = result.get("models", [])
|
| 865 |
+
ngram_size = result.get("ngram_size", 2)
|
| 866 |
+
size_name = "Unigrams" if ngram_size == 1 else f"{ngram_size}-grams"
|
| 867 |
+
|
| 868 |
+
if len(models) >= 2:
|
| 869 |
+
content += f"{size_name} Analysis: Comparing responses from {models[0]} and {models[1]}\n\n"
|
| 870 |
+
|
| 871 |
+
# Add important n-grams for each model
|
| 872 |
+
important_ngrams = result.get("important_ngrams", {})
|
| 873 |
+
for model_name in models:
|
| 874 |
+
if model_name in important_ngrams:
|
| 875 |
+
content += f"Top {size_name} Used by {model_name}\n"
|
| 876 |
+
ngram_list = [f"{item['ngram']} ({item['count']})" for item in important_ngrams[model_name][:10]]
|
| 877 |
+
content += ", ".join(ngram_list) + "\n\n"
|
| 878 |
+
|
| 879 |
+
# Add similarity metrics
|
| 880 |
+
if "comparisons" in result:
|
| 881 |
+
comparison_key = f"{models[0]} vs {models[1]}"
|
| 882 |
+
if comparison_key in result["comparisons"]:
|
| 883 |
+
metrics = result["comparisons"][comparison_key]
|
| 884 |
+
content += "Similarity Metrics\n"
|
| 885 |
+
content += f"Common {size_name}: {metrics.get('common_ngram_count', 0)} {size_name.lower()} appear in both responses\n\n"
|
| 886 |
+
|
| 887 |
+
elif analysis_type == "Classifier":
|
| 888 |
+
models = result.get("models", [])
|
| 889 |
+
if len(models) >= 2:
|
| 890 |
+
content += f"Classifier Analysis for {models[0]} and {models[1]}\n\n"
|
| 891 |
+
|
| 892 |
+
# Add classification results
|
| 893 |
+
classifications = result.get("classifications", {})
|
| 894 |
+
if classifications:
|
| 895 |
+
content += "Classification Results\n"
|
| 896 |
+
for model_name in models:
|
| 897 |
+
if model_name in classifications:
|
| 898 |
+
model_results = classifications[model_name]
|
| 899 |
+
content += f"{model_name}:\n"
|
| 900 |
+
content += f"- Formality: {model_results.get('formality', 'N/A')}\n"
|
| 901 |
+
content += f"- Sentiment: {model_results.get('sentiment', 'N/A')}\n"
|
| 902 |
+
content += f"- Complexity: {model_results.get('complexity', 'N/A')}\n\n"
|
| 903 |
+
|
| 904 |
+
# Add differences
|
| 905 |
+
differences = result.get("differences", {})
|
| 906 |
+
if differences:
|
| 907 |
+
content += "Classification Comparison\n"
|
| 908 |
+
for category, diff in differences.items():
|
| 909 |
+
content += f"- {category}: {diff}\n"
|
| 910 |
+
content += "\n"
|
| 911 |
+
|
| 912 |
+
elif analysis_type == "Bias Detection":
|
| 913 |
+
models = result.get("models", [])
|
| 914 |
+
if len(models) >= 2:
|
| 915 |
+
content += f"Bias Analysis: Comparing responses from {models[0]} and {models[1]}\n\n"
|
| 916 |
+
|
| 917 |
+
# Add comparative results
|
| 918 |
+
if "comparative" in result:
|
| 919 |
+
comparative = result["comparative"]
|
| 920 |
+
content += "Bias Detection Summary\n"
|
| 921 |
+
|
| 922 |
+
if "partisan" in comparative:
|
| 923 |
+
part = comparative["partisan"]
|
| 924 |
+
is_significant = part.get("significant", False)
|
| 925 |
+
content += f"Partisan Leaning: {models[0]} appears {part.get(models[0], 'N/A')}, "
|
| 926 |
+
content += f"while {models[1]} appears {part.get(models[1], 'N/A')}. "
|
| 927 |
+
content += f"({'Significant' if is_significant else 'Minor'} difference)\n\n"
|
| 928 |
+
|
| 929 |
+
if "overall" in comparative:
|
| 930 |
+
overall = comparative["overall"]
|
| 931 |
+
significant = overall.get("significant_bias_difference", False)
|
| 932 |
+
content += f"Overall Assessment: "
|
| 933 |
+
content += f"Analysis shows a {overall.get('difference', 0):.2f}/1.0 difference in bias patterns. "
|
| 934 |
+
content += f"({'Significant' if significant else 'Minor'} overall bias difference)\n\n"
|
| 935 |
+
|
| 936 |
+
# Add partisan terms
|
| 937 |
+
content += "Partisan Term Analysis\n"
|
| 938 |
+
for model_name in models:
|
| 939 |
+
if model_name in result and "partisan" in result[model_name]:
|
| 940 |
+
partisan = result[model_name]["partisan"]
|
| 941 |
+
content += f"{model_name}:\n"
|
| 942 |
+
|
| 943 |
+
lib_terms = partisan.get("liberal_terms", [])
|
| 944 |
+
con_terms = partisan.get("conservative_terms", [])
|
| 945 |
+
|
| 946 |
+
content += f"- Liberal terms: {', '.join(lib_terms) if lib_terms else 'None detected'}\n"
|
| 947 |
+
content += f"- Conservative terms: {', '.join(con_terms) if con_terms else 'None detected'}\n\n"
|
| 948 |
+
|
| 949 |
+
elif analysis_type == "RoBERTa Sentiment":
|
| 950 |
+
models = result.get("models", [])
|
| 951 |
+
if len(models) >= 2:
|
| 952 |
+
content += "Sentiment Analysis Results\n"
|
| 953 |
+
|
| 954 |
+
# Add comparison info
|
| 955 |
+
if "comparison" in result:
|
| 956 |
+
comparison = result["comparison"]
|
| 957 |
+
if "difference_direction" in comparison:
|
| 958 |
+
content += f"{comparison['difference_direction']}\n\n"
|
| 959 |
+
|
| 960 |
+
# Add individual model results
|
| 961 |
+
sentiment_analysis = result.get("sentiment_analysis", {})
|
| 962 |
+
for model_name in models:
|
| 963 |
+
if model_name in sentiment_analysis:
|
| 964 |
+
model_result = sentiment_analysis[model_name]
|
| 965 |
+
score = model_result.get("sentiment_score", 0)
|
| 966 |
+
label = model_result.get("label", "neutral")
|
| 967 |
+
|
| 968 |
+
content += f"{model_name}\n"
|
| 969 |
+
content += f"Sentiment: {label} (Score: {score:.2f})\n\n"
|
| 970 |
+
|
| 971 |
+
return content, f"✅ **Loaded user analysis results**"
|
| 972 |
|
| 973 |
+
# Regular file loading for built-in summaries
|
| 974 |
file_path = os.path.join("dataset", file_name)
|
| 975 |
if os.path.exists(file_path):
|
| 976 |
try:
|
|
|
|
| 981 |
return "", f"❌ **Error loading summary**: {str(e)}"
|
| 982 |
else:
|
| 983 |
return "", f"❌ **File not found**: {file_path}"
|
| 984 |
+
|
| 985 |
+
def update_summary_dropdown(user_log):
|
| 986 |
+
"""Update summary dropdown options based on user log state"""
|
| 987 |
+
choices = ["YOUR DATASET RESULTS"]
|
| 988 |
+
choices.extend([f for f in os.listdir("dataset") if f.startswith("summary-") and f.endswith(".txt")])
|
| 989 |
+
return gr.Dropdown.update(choices=choices, value="YOUR DATASET RESULTS")
|
| 990 |
|
| 991 |
# Connect the load button to the function
|
| 992 |
load_summary_btn.click(
|
| 993 |
+
fn=load_summary_content,
|
| 994 |
+
inputs=[summary_dropdown, user_analysis_log],
|
| 995 |
outputs=[summary_content, summary_status]
|
| 996 |
)
|
| 997 |
|
| 998 |
# Also load summary when dropdown changes
|
| 999 |
summary_dropdown.change(
|
| 1000 |
+
fn=load_summary_content,
|
| 1001 |
+
inputs=[summary_dropdown, user_analysis_log],
|
| 1002 |
outputs=[summary_content, summary_status]
|
| 1003 |
)
|
| 1004 |
# Add a Visuals tab for plotting graphs
|
|
|
|
| 1184 |
# Run analysis with proper parameters
|
| 1185 |
run_analysis_btn.click(
|
| 1186 |
fn=run_analysis,
|
| 1187 |
+
inputs=[dataset_state, analysis_options, ngram_n, topic_count, user_analysis_log],
|
| 1188 |
outputs=[
|
| 1189 |
analysis_results_state,
|
| 1190 |
+
user_analysis_log,
|
| 1191 |
analysis_output,
|
| 1192 |
visualization_area_visible,
|
| 1193 |
analysis_title,
|
|
|
|
| 1204 |
]
|
| 1205 |
)
|
| 1206 |
|
| 1207 |
+
app.load(
|
| 1208 |
+
fn=lambda log: (
|
| 1209 |
+
update_summary_dropdown(log),
|
| 1210 |
+
load_summary_content("YOUR DATASET RESULTS", log)
|
| 1211 |
+
),
|
| 1212 |
+
inputs=[user_analysis_log],
|
| 1213 |
+
outputs=[summary_dropdown, summary_content, summary_status]
|
| 1214 |
+
)
|
| 1215 |
+
|
| 1216 |
return app
|
| 1217 |
|
| 1218 |
if __name__ == "__main__":
|