Ryan commited on
Commit
286cf6b
·
1 Parent(s): e59609b
Files changed (3) hide show
  1. .DS_Store +0 -0
  2. dataset/summary-harris.txt +39 -41
  3. dataset/summary-trump.txt +39 -21
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
dataset/summary-harris.txt CHANGED
@@ -1,52 +1,40 @@
1
- # Analysis of LLM Responses Comparing ExaOne3.5 and Granite3.2
2
 
3
- Thanks for sharing these comparison results analyzing how two LLMs (ExaOne3.5 by LG and Granite3.2 by IBM) responded to the prompt about Kamala Harris's political views. Let me interpret the key differences for you.
4
 
5
- ## Content and Focus Differences
 
 
 
 
 
 
 
 
6
 
7
- Looking at the top words and 2-grams:
8
- - **ExaOne3.5** emphasized Harris's legal background and policy implementation, using terms like "attorney general," "social justice," and "centrist approach" more frequently.
9
- - **Granite3.2** focused more on her political positions and party affiliation, using terms like "support," "political views," "vice president," and "democratic party."
10
-
11
- ExaOne appears to have framed Harris more through her professional background and specific policy areas, while Granite focused more directly on her political identity and positions.
12
-
13
- ## Similarity Metrics
14
-
15
- The models show moderate similarity:
16
- - **Cosine similarity (0.67)**: Their word frequency patterns overlap somewhat but aren't identical
17
- - **Jaccard similarity (0.22)**: Only about a fifth of unique words appeared in both responses
18
- - **Semantic similarity (0.53)**: The overall meaning was moderately similar
19
-
20
- This suggests the models presented somewhat different portraits despite covering the same person.
21
-
22
- ## Political Framing and Bias Analysis
23
-
24
- Both models show a liberal-leaning framing:
25
- - **ExaOne3.5** used more liberal-associated terms (11 liberal vs. 2 conservative terms)
26
- - **Granite3.2** used exclusively liberal-associated terms (7 liberal, 0 conservative)
27
-
28
- However, the overall bias difference was minor (0.15/1.0), suggesting neither model took a dramatically different political stance than the other.
29
-
30
- ## Stylistic Differences
31
-
32
- The models differed significantly in communication style:
33
- - **ExaOne3.5**: More informal and complex language
34
- - **Granite3.2**: More neutral tone with average complexity
35
-
36
- This could impact how authoritative or approachable the responses feel to readers.
37
-
38
- ## Overall Interpretation
39
-
40
- These LLMs presented moderately different portraits of Harris's political views despite addressing the same prompt. ExaOne3.5 created a more detailed, nuanced picture with higher linguistic complexity and focused more on Harris's background and specific policy areas. Granite3.2 took a more straightforward, neutral approach that centered on her political identity and party positions.
41
-
42
- Neither model showed dramatic political bias relative to the other, though both framed Harris through terms more commonly associated with liberal perspectives.
43
-
44
- The differences highlight how LLMs can present varied portraits of the same political figure based on their training data, internal architecture, and potential alignment methods.
45
 
 
 
 
46
 
 
 
 
 
 
47
 
 
48
 
 
 
 
 
49
 
 
50
 
51
 
52
 
@@ -118,3 +106,13 @@ Complexity: Average
118
  Classification Comparison
119
  Formality: Model 1 is informal, while Model 2 is neutral
120
  Complexity: Model 1 uses complex language, while Model 2 uses average language
 
 
 
 
 
 
 
 
 
 
 
1
+ Response by ChatGPT on Kamala Harris' political views.
2
 
3
+ These metrics paint a clear picture of how the two models diverge in style, content focus, and subtle bias—even when answering the exact same question. Here’s what stands out:
4
 
5
+ **1. Vocabulary and Focus**
6
+ - **ExaOne3.5** emphasizes concrete policy terms:
7
+ - *Top words*: “policy,” “trade,” “agreement,” “immigration,” “order”
8
+ - *Top 2-grams*: “tax cuts,” “climate change,” “executive orders,” “free trade,” “law order”
9
+ This suggests ExaOne zeroes in on Trump’s signature policy moves (tariffs, deregulation, climate skepticism), using fairly specialized language.
10
+ - **Granite3.2** leans more on context and framing:
11
+ - *Top words*: “trump,” “administration,” “agreement,” “ban,” “stance”
12
+ - *Top 2-grams*: “administration took,” “foreign policy,” “social issues,” “45th president”
13
+ Granite frames Trump within his role and major actions (“the administration did X”), giving slightly broader context rather than drilling immediately into specific policy terms.
14
 
15
+ **2. Overlap and Distinctiveness**
16
+ - **Cosine similarity (0.58)** indicates moderate overlap in word‐frequency patterns—so you’ll see some of the same building blocks in both outputs.
17
+ - **Jaccard similarity (0.16)** is quite low, meaning there’s relatively little exact word overlap. In practice that means each model brings plenty of unique phrasing to the table.
18
+ - **Semantic similarity (0.45)** tells us they aren’t strongly aligned in the ideas they emphasize; you’ll get different “angles” on Trump’s politics.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ **3. Bias Signals**
21
+ - ExaOne3.5 shows a slight conservative tilt: it uses more “conservative” buzzwords (e.g. freedom, deregulation, security) and even repeats “traditional” twice, whereas Granite3.2 is tagged as more balanced, with only one conservative term flagged and one liberal (“climate”).
22
+ - A **0.20/1.0 bias-difference score** is small but perceptible—so if you need a truly neutral rundown, Granite3.2 edges out ExaOne.
23
 
24
+ **4. Tone & Complexity**
25
+ - Both answers are **informal** and carry a **positive** sentiment overall, but:
26
+ - **ExaOne3.5** is rated **complex**—you’ll find longer sentences, more technical terms, denser explanations.
27
+ - **Granite3.2** is **average** in complexity, favoring smoother readability.
28
+ - Sentiment scores reflect this: Granite3.2 (0.15) comes across slightly more upbeat than ExaOne3.5 (–0.12), even though both land in the “neutral” zone.
29
 
30
+ ---
31
 
32
+ ### What This Means for You
33
+ - **If you want depth and policy minutiae**, ExaOne3.5 delivers more technical detail on things like tax cuts, trade agreements, and executive actions—albeit with a faint conservative slant and denser prose.
34
+ - **If you prefer clear, balanced context**, Granite3.2 frames Trump’s views around his presidential role and major initiatives in a more neutral tone and with easier readability.
35
+ - **Neither model is radically biased**, but choice matters if you’re aiming for absolutely even-handed coverage vs. a slightly more policy-driven sketch.
36
 
37
+ By weighing those trade-offs—complexity vs. simplicity, policy details vs. broader context, and slight bias differences—you can pick the LLM whose style best matches your audience and goals.
38
 
39
 
40
 
 
106
  Classification Comparison
107
  Formality: Model 1 is informal, while Model 2 is neutral
108
  Complexity: Model 1 uses complex language, while Model 2 uses average language
109
+
110
+ RoBERTa Sentiment Analysis Results
111
+ Sentiment Analysis Results
112
+ ExaOne3.5 is more positive than Granite3.2
113
+
114
+ ExaOne3.5
115
+ Sentiment: positive (Score: 0.52)
116
+
117
+ Granite3.2
118
+ Sentiment: neutral (Score: 0.37)
dataset/summary-trump.txt CHANGED
@@ -1,30 +1,38 @@
1
- I'll analyze the differences between LG ExaOne and IBM Granite in their responses to the prompt about Donald Trump's political views.
2
 
3
- ## Key Differences Between LG ExaOne and IBM Granite
4
 
5
- ### Content Focus
6
- - **ExaOne** tends to emphasize policy-oriented aspects (more mentions of "policy," "trade," "agreement") and frequently uses qualifiers like "often"
7
- - **Granite** places more focus on Trump himself (more mentions of "Trump") and his administration
 
8
 
9
- ### Language Style
10
- - **ExaOne** uses more complex language according to the classifier analysis
11
- - **Granite** uses more average/accessible language complexity
 
 
12
 
13
- ### Political Framing
14
- - **ExaOne** appears to have a slightly conservative-leaning framing (more conservative terms than liberal terms were detected)
15
- - **Granite** maintains a more balanced approach with fewer ideologically charged terms
 
16
 
17
- ### Topical Coverage
18
- - **ExaOne** emphasizes phrases like "tax cuts," "climate change," "executive orders," "free speech," and "mainstream media"
19
- - **Granite** focuses more on "administration," "foreign policy," "political stance," and "United States"
 
 
20
 
21
- ### Similarity
22
- - The responses show moderate similarity (0.58 cosine similarity, 0.45 semantic similarity)
23
- - Only 16% word overlap (Jaccard similarity of 0.16)
24
- - They share 72 common words and 26 common two-word phrases
25
 
26
- ### Overall Assessment
27
- The results suggest that while both models provide factual information about Trump's political views with a positive sentiment and informal tone, ExaOne presents this information with more complex language and a slightly more conservative framing, while Granite offers a more balanced perspective with more accessible language. ExaOne appears to focus more on specific policy positions and ideological frameworks, while Granite presents a more administratively-focused overview of Trump's political stances.
 
 
28
 
29
 
30
 
@@ -94,4 +102,14 @@ Formality: Informal
94
  Sentiment: Positive
95
  Complexity: Average
96
  Classification Comparison
97
- Complexity: Model 1 uses complex language, while Model 2 uses average language
 
 
 
 
 
 
 
 
 
 
 
1
+ Response by Claude 3.7
2
 
3
+ I'll interpret these NLP analysis results comparing how ExaOne3.5 and Granite3.2 responded to the prompt about Donald Trump's political views.
4
 
5
+ ## Word Frequency Analysis
6
+ The top words used by each model show some overlap but different emphasis:
7
+ - ExaOne3.5 focuses more on specific policy areas with words like "trade," "agreement," "free," and "immigration"
8
+ - Granite3.2 mentions "trump" more frequently and includes terms like "administration," "stance," and "view"
9
 
10
+ ## Similarity Metrics
11
+ - Cosine similarity of 0.58 indicates moderate similarity in word frequency patterns
12
+ - Jaccard similarity of 0.16 shows limited word overlap between responses
13
+ - Semantic similarity of 0.45 suggests the responses convey somewhat similar meanings but with notable differences
14
+ - 72 common words between both responses
15
 
16
+ ## N-gram Analysis
17
+ The 2-gram analysis reveals different contextual emphasis:
18
+ - ExaOne3.5 focuses more on specific policy positions: "tax cuts," "climate change," "free speech," "law order"
19
+ - Granite3.2 uses more general descriptive phrases: "foreign policy," "political stance," "united states"
20
 
21
+ ## Bias Analysis
22
+ - ExaOne3.5 appears to lean slightly conservative in its response
23
+ - Granite3.2 is classified as more balanced
24
+ - ExaOne3.5 uses more partisan terms overall (both liberal and conservative)
25
+ - The overall bias difference is minor (0.20/1.0)
26
 
27
+ ## Style Classification
28
+ - Both models use informal language with positive sentiment
29
+ - ExaOne3.5 uses more complex language compared to Granite3.2's average complexity
30
+ - RoBERTa sentiment analysis shows both are technically neutral, but Granite3.2 leans slightly more positive (-0.12 vs 0.15)
31
 
32
+ ## Summary Interpretation
33
+ Based on these results, ExaOne3.5 appears to provide a more detailed, policy-focused response with slightly more complex language and a minor conservative lean. Granite3.2 offers a more balanced, general overview with simpler language and a slightly more positive tone.
34
+
35
+ The differences suggest that while both models are discussing the same subject, they frame and emphasize different aspects of Trump's political views, with moderate overall similarity in content but different stylistic approaches.
36
 
37
 
38
 
 
102
  Sentiment: Positive
103
  Complexity: Average
104
  Classification Comparison
105
+ Complexity: Model 1 uses complex language, while Model 2 uses average language
106
+
107
+ RoBERTa Sentiment Analysis Results
108
+ Sentiment Analysis Results
109
+ Granite3.2 is more positive than ExaOne3.5
110
+
111
+ ExaOne3.5
112
+ Sentiment: neutral (Score: -0.12)
113
+
114
+ Granite3.2
115
+ Sentiment: neutral (Score: 0.15)