Spaces:

RyanS974
/

525GradioApp

Sleeping

App Files Files Community

Ryan commited on Apr 23

Commit

adf119a

1 Parent(s): 39135b5

update

Browse files

Files changed (10) hide show

.DS_Store +0 -0
README.md +29 -0
dataset/.DS_Store +0 -0
dataset/summary-econ.txt +0 -1
dataset/summary-fp.txt +0 -1
dataset/summary-harris.txt +120 -1
dataset/summary-trump.txt +97 -1
processors/topic_modeling.py +0 -2
ui/main_screen.py +1 -5
visualization/bow_visualizer.py +11 -27

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

README.md CHANGED Viewed

@@ -12,3 +12,32 @@ short_description: LLM Response Comparator
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Gradio App User Guide
+This is my Gradio App homework assignment User Guide.
+Here is a link to the video demo also:
+# Introduction
+This is a Gradio app that allows you to compare the responses of two different LLMs (Language Models) to the same input prompt. The app provides a simple interface where you can enter a prompt and select two different LLMs from a dropdown menu. Once you submit the prompt, the app will display the responses from both LLMs side by side for easy comparison.
+The app is built using the Gradio library, which provides a user-friendly interface for creating web applications with Python. The app uses the `gr.Interface` class to create the interface and the `gr.inputs.Textbox` and `gr.outputs.Textbox` classes to define the input and output components of the app.
+The app also includes a `gr.Button` component that allows you to submit the prompt and get the responses from the selected LLMs. The app uses the `gr.update` method to update the output components with the responses from the LLMs.
+The app is designed to be easy to use and provides a simple way to compare the responses of different LLMs to the same input prompt. It can be useful for researchers and developers who want to evaluate the performance of different LLMs on the same task.
+# Usage
+# Documentation
+# Contributions
+# Limitations

dataset/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

dataset/summary-econ.txt DELETED Viewed

	@@ -1 +0,0 @@
1	- Test 1 2 3

dataset/summary-fp.txt DELETED Viewed

	@@ -1 +0,0 @@
1	- Test 1 2 3

dataset/summary-harris.txt CHANGED Viewed

	@@ -1 +1,120 @@
1	- ~~Test~~ 1 2 3

+# Analysis of LLM Responses Comparing ExaOne3.5 and Granite3.2
+Thanks for sharing these comparison results analyzing how two LLMs (ExaOne3.5 by LG and Granite3.2 by IBM) responded to the prompt about Kamala Harris's political views. Let me interpret the key differences for you.
+## Content and Focus Differences
+Looking at the top words and 2-grams:
+- **ExaOne3.5** emphasized Harris's legal background and policy implementation, using terms like "attorney general," "social justice," and "centrist approach" more frequently.
+- **Granite3.2** focused more on her political positions and party affiliation, using terms like "support," "political views," "vice president," and "democratic party."
+ExaOne appears to have framed Harris more through her professional background and specific policy areas, while Granite focused more directly on her political identity and positions.
+## Similarity Metrics
+The models show moderate similarity:
+- **Cosine similarity (0.67)**: Their word frequency patterns overlap somewhat but aren't identical
+- **Jaccard similarity (0.22)**: Only about a fifth of unique words appeared in both responses
+- **Semantic similarity (0.53)**: The overall meaning was moderately similar
+This suggests the models presented somewhat different portraits despite covering the same person.
+## Political Framing and Bias Analysis
+Both models show a liberal-leaning framing:
+- **ExaOne3.5** used more liberal-associated terms (11 liberal vs. 2 conservative terms)
+- **Granite3.2** used exclusively liberal-associated terms (7 liberal, 0 conservative)
+However, the overall bias difference was minor (0.15/1.0), suggesting neither model took a dramatically different political stance than the other.
+## Stylistic Differences
+The models differed significantly in communication style:
+- **ExaOne3.5**: More informal and complex language
+- **Granite3.2**: More neutral tone with average complexity
+This could impact how authoritative or approachable the responses feel to readers.
+## Overall Interpretation
+These LLMs presented moderately different portraits of Harris's political views despite addressing the same prompt. ExaOne3.5 created a more detailed, nuanced picture with higher linguistic complexity and focused more on Harris's background and specific policy areas. Granite3.2 took a more straightforward, neutral approach that centered on her political identity and party positions.
+Neither model showed dramatic political bias relative to the other, though both framed Harris through terms more commonly associated with liberal perspectives.
+The differences highlight how LLMs can present varied portraits of the same political figure based on their training data, internal architecture, and potential alignment methods.
+ACTUAL ANALYSIS RESULTS
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Kamala Harris...."
+Comparing responses from ExaOne3.5 and Granite3.2
+Top Words Used by ExaOne3.5
+harris (8), policy (8), justice (5), attorney (4), issue (4), measure (4), political (4), aimed (3), approach (3), general (3)
+Top Words Used by Granite3.2
+harris (7), support (6), view (6), issue (5), right (5), policy (4), party (3), political (3), president (3), progressive (3)
+Similarity Metrics
+Cosine Similarity: 0.67 (higher means more similar word frequency patterns)
+Jaccard Similarity: 0.22 (higher means more word overlap)
+Semantic Similarity: 0.53 (higher means more similar meaning)
+Common Words: 71 words appear in both responses
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Kamala Harris...."
+2-grams Analysis: Comparing responses from ExaOne3.5 and Granite3.2
+Top 2-grams Used by ExaOne3.5
+attorney general (3), social justice (3), centrist approach (2), climate change (2), criminal justice (2), gun control (2), human rights (2), justice issues (2), measures like (2), middle class (2)
+Top 2-grams Used by Granite3.2
+political views (3), vice president (3), criminal justice (2), democratic party (2), foreign policy (2), harris advocated (2), lgbtq rights (2), president harris (2), social issues (2), 2019 proposed (1)
+Similarity Metrics
+Common 2-grams: 24 2-grams appear in both responses
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Kamala Harris...."
+Bias Analysis: Comparing responses from ExaOne3.5 and Granite3.2
+Bias Detection Summary
+Partisan Leaning: ExaOne3.5 appears liberal, while Granite3.2 appears liberal. (Minor difference)
+Overall Assessment: Analysis shows a 0.15/1.0 difference in bias patterns. (Minor overall bias difference)
+Partisan Term Analysis
+ExaOne3.5:
+Liberal terms: progressive, progressive, progressive, climate, climate, reform, justice, justice, justice, justice, justice
+Conservative terms: values, security
+Granite3.2:
+Liberal terms: progressive, progressive, progressive, climate, reform, justice, justice
+Conservative terms: None detected
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Kamala Harris...."
+Classifier Analysis for ExaOne3.5 and Granite3.2
+Classification Results
+ExaOne3.5:
+Formality: Informal
+Sentiment: Positive
+Complexity: Complex
+Granite3.2:
+Formality: Neutral
+Sentiment: Positive
+Complexity: Average
+Classification Comparison
+Formality: Model 1 is informal, while Model 2 is neutral
+Complexity: Model 1 uses complex language, while Model 2 uses average language

dataset/summary-trump.txt CHANGED Viewed

	@@ -1 +1,97 @@
1	- ~~Test~~ 1 2 3

+I'll analyze the differences between LG ExaOne and IBM Granite in their responses to the prompt about Donald Trump's political views.
+## Key Differences Between LG ExaOne and IBM Granite
+### Content Focus
+- **ExaOne** tends to emphasize policy-oriented aspects (more mentions of "policy," "trade," "agreement") and frequently uses qualifiers like "often"
+- **Granite** places more focus on Trump himself (more mentions of "Trump") and his administration
+### Language Style
+- **ExaOne** uses more complex language according to the classifier analysis
+- **Granite** uses more average/accessible language complexity
+### Political Framing
+- **ExaOne** appears to have a slightly conservative-leaning framing (more conservative terms than liberal terms were detected)
+- **Granite** maintains a more balanced approach with fewer ideologically charged terms
+### Topical Coverage
+- **ExaOne** emphasizes phrases like "tax cuts," "climate change," "executive orders," "free speech," and "mainstream media"
+- **Granite** focuses more on "administration," "foreign policy," "political stance," and "United States"
+### Similarity
+- The responses show moderate similarity (0.58 cosine similarity, 0.45 semantic similarity)
+- Only 16% word overlap (Jaccard similarity of 0.16)
+- They share 72 common words and 26 common two-word phrases
+### Overall Assessment
+The results suggest that while both models provide factual information about Trump's political views with a positive sentiment and informal tone, ExaOne presents this information with more complex language and a slightly more conservative framing, while Granite offers a more balanced perspective with more accessible language. ExaOne appears to focus more on specific policy positions and ideological frameworks, while Granite presents a more administratively-focused overview of Trump's political stances.
+ACTUAL ANALYSIS RESULTS
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Donald Trump...."
+Comparing responses from ExaOne3.5 and Granite3.2
+Top Words Used by ExaOne3.5
+policy (8), trade (8), often (7), agreement (6), like (5), free (4), immigration (4), issue (4), medium (4), order (4)
+Top Words Used by Granite3.2
+trump (7), administration (4), agreement (4), policy (4), political (4), trade (4), ban (3), certain (3), stance (3), view (3)
+Similarity Metrics
+Cosine Similarity: 0.58 (higher means more similar word frequency patterns)
+Jaccard Similarity: 0.16 (higher means more word overlap)
+Semantic Similarity: 0.45 (higher means more similar meaning)
+Common Words: 72 words appear in both responses
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Donald Trump...."
+2-grams Analysis: Comparing responses from ExaOne3.5 and Granite3.2
+Top 2-grams Used by ExaOne3.5
+tax cuts (3), climate change (2), executive orders (2), free speech (2), free trade (2), law order (2), legal immigration (2), mainstream media (2), political views (2), skepticism climate (2)
+Top 2-grams Used by Granite3.2
+administration took (2), foreign policy (2), political stance (2), political views (2), social issues (2), trump generally (2), united states (2), 2017 2021 (1), 2021 known (1), 45th president (1)
+Similarity Metrics
+Common 2-grams: 26 2-grams appear in both responses
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Donald Trump...."
+Bias Analysis: Comparing responses from ExaOne3.5 and Granite3.2
+Bias Detection Summary
+Partisan Leaning: ExaOne3.5 appears conservative, while Granite3.2 appears balanced. (Minor difference)
+Overall Assessment: Analysis shows a 0.20/1.0 difference in bias patterns. (Minor overall bias difference)
+Partisan Term Analysis
+ExaOne3.5:
+Liberal terms: climate, climate, justice
+Conservative terms: traditional, traditional, freedom, individual, deregulation, deregulation, security
+Granite3.2:
+Liberal terms: climate
+Conservative terms: deregulation
+Analysis Results
+Analysis of Prompt: "Tell me about the political views of Donald Trump...."
+Classifier Analysis for ExaOne3.5 and Granite3.2
+Classification Results
+ExaOne3.5:
+Formality: Informal
+Sentiment: Positive
+Complexity: Complex
+Granite3.2:
+Formality: Informal
+Sentiment: Positive
+Complexity: Average
+Classification Comparison
+Complexity: Model 1 uses complex language, while Model 2 uses average language

processors/topic_modeling.py CHANGED Viewed

@@ -289,11 +289,9 @@ def extract_topics(texts, n_topics=3, n_top_words=10, method="lda"):
         # Create document-term matrix
         if method == "nmf":
             # For NMF, use TF-IDF vectorization
-            # FIXED: Modified min_df and max_df for small document sets
             vectorizer = TfidfVectorizer(max_features=1000, min_df=1, max_df=1.0)
         else:
             # For LDA, use CountVectorizer
-            # FIXED: Modified min_df and max_df for small document sets
             vectorizer = CountVectorizer(max_features=1000, min_df=1, max_df=1.0)
         X = vectorizer.fit_transform(preprocessed_texts)

         # Create document-term matrix
         if method == "nmf":
             # For NMF, use TF-IDF vectorization
             vectorizer = TfidfVectorizer(max_features=1000, min_df=1, max_df=1.0)
         else:
             # For LDA, use CountVectorizer
             vectorizer = CountVectorizer(max_features=1000, min_df=1, max_df=1.0)
         X = vectorizer.fit_transform(preprocessed_texts)

ui/main_screen.py CHANGED Viewed

@@ -23,19 +23,15 @@ def create_main_screen():
             This application allows you to compare how different Large Language Models (LLMs) respond
             to the same political prompts or questions. Using various NLP techniques, the tool analyzes:
-            - **Topic Modeling**: What key topics do different LLMs emphasize?
             - **N-gram Analysis**: What phrases and word patterns are characteristic of each LLM?
             - **Bias Detection**: Are there detectable biases in how LLMs approach political topics?
             - **Text Classification**: How do responses cluster or differentiate?
-            - **Key Differences**: What specific content varies between models?
             ### How to Use
             1. Navigate to the **Dataset Input** tab
             2. Enter prompts and corresponding LLM responses, or load an example dataset
-            3. Run various analyses to see how the responses compare
-            4. Explore visualizations of the differences
-            5. Generate a comprehensive report of findings
             This tool is for educational and research purposes to better understand how LLMs handle
             politically sensitive topics.

             This application allows you to compare how different Large Language Models (LLMs) respond
             to the same political prompts or questions. Using various NLP techniques, the tool analyzes:
             - **N-gram Analysis**: What phrases and word patterns are characteristic of each LLM?
             - **Bias Detection**: Are there detectable biases in how LLMs approach political topics?
             - **Text Classification**: How do responses cluster or differentiate?
             ### How to Use
             1. Navigate to the **Dataset Input** tab
             2. Enter prompts and corresponding LLM responses, or load an example dataset
+            3. Run various analyses to see how the responses compare
             This tool is for educational and research purposes to better understand how LLMs handle
             politically sensitive topics.

visualization/bow_visualizer.py CHANGED Viewed

@@ -156,7 +156,7 @@ def process_and_visualize_analysis(analysis_results):
                 print("Processing Bag of Words visualization")
                 components.append(gr.Markdown("### Bag of Words Analysis"))
                 bow_results = analyses["bag_of_words"]
                 # Display models compared
                 if "models" in bow_results:
                     models = bow_results["models"]
@@ -170,32 +170,16 @@ def process_and_visualize_analysis(analysis_results):
                         print(f"Creating word list for model {model}")
                         word_list = [f"{item['word']} ({item['count']})" for item in words[:10]]
                         components.append(gr.Markdown(f"**{model}**: {', '.join(word_list)}"))
-                # Add visualizations for word frequency differences
-                if "differential_words" in bow_results and "word_count_matrix" in bow_results and len(
-                        bow_results["models"]) >= 2:
-                    diff_words = bow_results["differential_words"]
-                    word_matrix = bow_results["word_count_matrix"]
-                    models = bow_results["models"]
-                    if diff_words and word_matrix and len(diff_words) > 0:
-                        components.append(gr.Markdown("### Words with Biggest Frequency Differences"))
-                        # Create dataframe for plotting
-                        model1, model2 = models[0], models[1]
-                        diff_data = []
-                        for word in diff_words[:10]:  # Limit to top 10 for readability
-                            if word in word_matrix:
-                                counts = word_matrix[word]
-                                model1_count = counts.get(model1, 0)
-                                model2_count = counts.get(model2, 0)
-                                # Only include if there's a meaningful difference
-                                if abs(model1_count - model2_count) > 0:
-                                    components.append(gr.Markdown(
-                                        f"- **{word}**: {model1}: {model1_count}, {model2}: {model2_count}"
-                                    ))
             # Check for N-gram analysis
             if "ngram_analysis" in analyses:

                 print("Processing Bag of Words visualization")
                 components.append(gr.Markdown("### Bag of Words Analysis"))
                 bow_results = analyses["bag_of_words"]
                 # Display models compared
                 if "models" in bow_results:
                     models = bow_results["models"]
                         print(f"Creating word list for model {model}")
                         word_list = [f"{item['word']} ({item['count']})" for item in words[:10]]
                         components.append(gr.Markdown(f"**{model}**: {', '.join(word_list)}"))
+                # Add the detailed BOW visualization using the create_bow_visualization function
+                print("Adding detailed BOW visualization components")
+                bow_visualization_components = create_bow_visualization(
+                    {"analyses": {prompt: {"bag_of_words": bow_results}}}
+                )
+                # Skip the first component since it's a duplicate header
+                if len(bow_visualization_components) > 1:
+                    components.extend(bow_visualization_components[1:])
             # Check for N-gram analysis
             if "ngram_analysis" in analyses: