Spaces:
Sleeping
Sleeping
Ryan
commited on
Commit
·
286cf6b
1
Parent(s):
e59609b
update
Browse files- .DS_Store +0 -0
- dataset/summary-harris.txt +39 -41
- dataset/summary-trump.txt +39 -21
.DS_Store
CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
|
|
dataset/summary-harris.txt
CHANGED
@@ -1,52 +1,40 @@
|
|
1 |
-
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
-
|
8 |
-
- **
|
9 |
-
- **
|
10 |
-
|
11 |
-
ExaOne appears to have framed Harris more through her professional background and specific policy areas, while Granite focused more directly on her political identity and positions.
|
12 |
-
|
13 |
-
## Similarity Metrics
|
14 |
-
|
15 |
-
The models show moderate similarity:
|
16 |
-
- **Cosine similarity (0.67)**: Their word frequency patterns overlap somewhat but aren't identical
|
17 |
-
- **Jaccard similarity (0.22)**: Only about a fifth of unique words appeared in both responses
|
18 |
-
- **Semantic similarity (0.53)**: The overall meaning was moderately similar
|
19 |
-
|
20 |
-
This suggests the models presented somewhat different portraits despite covering the same person.
|
21 |
-
|
22 |
-
## Political Framing and Bias Analysis
|
23 |
-
|
24 |
-
Both models show a liberal-leaning framing:
|
25 |
-
- **ExaOne3.5** used more liberal-associated terms (11 liberal vs. 2 conservative terms)
|
26 |
-
- **Granite3.2** used exclusively liberal-associated terms (7 liberal, 0 conservative)
|
27 |
-
|
28 |
-
However, the overall bias difference was minor (0.15/1.0), suggesting neither model took a dramatically different political stance than the other.
|
29 |
-
|
30 |
-
## Stylistic Differences
|
31 |
-
|
32 |
-
The models differed significantly in communication style:
|
33 |
-
- **ExaOne3.5**: More informal and complex language
|
34 |
-
- **Granite3.2**: More neutral tone with average complexity
|
35 |
-
|
36 |
-
This could impact how authoritative or approachable the responses feel to readers.
|
37 |
-
|
38 |
-
## Overall Interpretation
|
39 |
-
|
40 |
-
These LLMs presented moderately different portraits of Harris's political views despite addressing the same prompt. ExaOne3.5 created a more detailed, nuanced picture with higher linguistic complexity and focused more on Harris's background and specific policy areas. Granite3.2 took a more straightforward, neutral approach that centered on her political identity and party positions.
|
41 |
-
|
42 |
-
Neither model showed dramatic political bias relative to the other, though both framed Harris through terms more commonly associated with liberal perspectives.
|
43 |
-
|
44 |
-
The differences highlight how LLMs can present varied portraits of the same political figure based on their training data, internal architecture, and potential alignment methods.
|
45 |
|
|
|
|
|
|
|
46 |
|
|
|
|
|
|
|
|
|
|
|
47 |
|
|
|
48 |
|
|
|
|
|
|
|
|
|
49 |
|
|
|
50 |
|
51 |
|
52 |
|
@@ -118,3 +106,13 @@ Complexity: Average
|
|
118 |
Classification Comparison
|
119 |
Formality: Model 1 is informal, while Model 2 is neutral
|
120 |
Complexity: Model 1 uses complex language, while Model 2 uses average language
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Response by ChatGPT on Kamala Harris' political views.
|
2 |
|
3 |
+
These metrics paint a clear picture of how the two models diverge in style, content focus, and subtle bias—even when answering the exact same question. Here’s what stands out:
|
4 |
|
5 |
+
**1. Vocabulary and Focus**
|
6 |
+
- **ExaOne3.5** emphasizes concrete policy terms:
|
7 |
+
- *Top words*: “policy,” “trade,” “agreement,” “immigration,” “order”
|
8 |
+
- *Top 2-grams*: “tax cuts,” “climate change,” “executive orders,” “free trade,” “law order”
|
9 |
+
This suggests ExaOne zeroes in on Trump’s signature policy moves (tariffs, deregulation, climate skepticism), using fairly specialized language.
|
10 |
+
- **Granite3.2** leans more on context and framing:
|
11 |
+
- *Top words*: “trump,” “administration,” “agreement,” “ban,” “stance”
|
12 |
+
- *Top 2-grams*: “administration took,” “foreign policy,” “social issues,” “45th president”
|
13 |
+
Granite frames Trump within his role and major actions (“the administration did X”), giving slightly broader context rather than drilling immediately into specific policy terms.
|
14 |
|
15 |
+
**2. Overlap and Distinctiveness**
|
16 |
+
- **Cosine similarity (0.58)** indicates moderate overlap in word‐frequency patterns—so you’ll see some of the same building blocks in both outputs.
|
17 |
+
- **Jaccard similarity (0.16)** is quite low, meaning there’s relatively little exact word overlap. In practice that means each model brings plenty of unique phrasing to the table.
|
18 |
+
- **Semantic similarity (0.45)** tells us they aren’t strongly aligned in the ideas they emphasize; you’ll get different “angles” on Trump’s politics.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
+
**3. Bias Signals**
|
21 |
+
- ExaOne3.5 shows a slight conservative tilt: it uses more “conservative” buzzwords (e.g. freedom, deregulation, security) and even repeats “traditional” twice, whereas Granite3.2 is tagged as more balanced, with only one conservative term flagged and one liberal (“climate”).
|
22 |
+
- A **0.20/1.0 bias-difference score** is small but perceptible—so if you need a truly neutral rundown, Granite3.2 edges out ExaOne.
|
23 |
|
24 |
+
**4. Tone & Complexity**
|
25 |
+
- Both answers are **informal** and carry a **positive** sentiment overall, but:
|
26 |
+
- **ExaOne3.5** is rated **complex**—you’ll find longer sentences, more technical terms, denser explanations.
|
27 |
+
- **Granite3.2** is **average** in complexity, favoring smoother readability.
|
28 |
+
- Sentiment scores reflect this: Granite3.2 (0.15) comes across slightly more upbeat than ExaOne3.5 (–0.12), even though both land in the “neutral” zone.
|
29 |
|
30 |
+
---
|
31 |
|
32 |
+
### What This Means for You
|
33 |
+
- **If you want depth and policy minutiae**, ExaOne3.5 delivers more technical detail on things like tax cuts, trade agreements, and executive actions—albeit with a faint conservative slant and denser prose.
|
34 |
+
- **If you prefer clear, balanced context**, Granite3.2 frames Trump’s views around his presidential role and major initiatives in a more neutral tone and with easier readability.
|
35 |
+
- **Neither model is radically biased**, but choice matters if you’re aiming for absolutely even-handed coverage vs. a slightly more policy-driven sketch.
|
36 |
|
37 |
+
By weighing those trade-offs—complexity vs. simplicity, policy details vs. broader context, and slight bias differences—you can pick the LLM whose style best matches your audience and goals.
|
38 |
|
39 |
|
40 |
|
|
|
106 |
Classification Comparison
|
107 |
Formality: Model 1 is informal, while Model 2 is neutral
|
108 |
Complexity: Model 1 uses complex language, while Model 2 uses average language
|
109 |
+
|
110 |
+
RoBERTa Sentiment Analysis Results
|
111 |
+
Sentiment Analysis Results
|
112 |
+
ExaOne3.5 is more positive than Granite3.2
|
113 |
+
|
114 |
+
ExaOne3.5
|
115 |
+
Sentiment: positive (Score: 0.52)
|
116 |
+
|
117 |
+
Granite3.2
|
118 |
+
Sentiment: neutral (Score: 0.37)
|
dataset/summary-trump.txt
CHANGED
@@ -1,30 +1,38 @@
|
|
1 |
-
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
6 |
-
|
7 |
-
-
|
|
|
8 |
|
9 |
-
|
10 |
-
-
|
11 |
-
-
|
|
|
|
|
12 |
|
13 |
-
|
14 |
-
|
15 |
-
-
|
|
|
16 |
|
17 |
-
|
18 |
-
-
|
19 |
-
-
|
|
|
|
|
20 |
|
21 |
-
|
22 |
-
-
|
23 |
-
-
|
24 |
-
-
|
25 |
|
26 |
-
|
27 |
-
|
|
|
|
|
28 |
|
29 |
|
30 |
|
@@ -94,4 +102,14 @@ Formality: Informal
|
|
94 |
Sentiment: Positive
|
95 |
Complexity: Average
|
96 |
Classification Comparison
|
97 |
-
Complexity: Model 1 uses complex language, while Model 2 uses average language
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Response by Claude 3.7
|
2 |
|
3 |
+
I'll interpret these NLP analysis results comparing how ExaOne3.5 and Granite3.2 responded to the prompt about Donald Trump's political views.
|
4 |
|
5 |
+
## Word Frequency Analysis
|
6 |
+
The top words used by each model show some overlap but different emphasis:
|
7 |
+
- ExaOne3.5 focuses more on specific policy areas with words like "trade," "agreement," "free," and "immigration"
|
8 |
+
- Granite3.2 mentions "trump" more frequently and includes terms like "administration," "stance," and "view"
|
9 |
|
10 |
+
## Similarity Metrics
|
11 |
+
- Cosine similarity of 0.58 indicates moderate similarity in word frequency patterns
|
12 |
+
- Jaccard similarity of 0.16 shows limited word overlap between responses
|
13 |
+
- Semantic similarity of 0.45 suggests the responses convey somewhat similar meanings but with notable differences
|
14 |
+
- 72 common words between both responses
|
15 |
|
16 |
+
## N-gram Analysis
|
17 |
+
The 2-gram analysis reveals different contextual emphasis:
|
18 |
+
- ExaOne3.5 focuses more on specific policy positions: "tax cuts," "climate change," "free speech," "law order"
|
19 |
+
- Granite3.2 uses more general descriptive phrases: "foreign policy," "political stance," "united states"
|
20 |
|
21 |
+
## Bias Analysis
|
22 |
+
- ExaOne3.5 appears to lean slightly conservative in its response
|
23 |
+
- Granite3.2 is classified as more balanced
|
24 |
+
- ExaOne3.5 uses more partisan terms overall (both liberal and conservative)
|
25 |
+
- The overall bias difference is minor (0.20/1.0)
|
26 |
|
27 |
+
## Style Classification
|
28 |
+
- Both models use informal language with positive sentiment
|
29 |
+
- ExaOne3.5 uses more complex language compared to Granite3.2's average complexity
|
30 |
+
- RoBERTa sentiment analysis shows both are technically neutral, but Granite3.2 leans slightly more positive (-0.12 vs 0.15)
|
31 |
|
32 |
+
## Summary Interpretation
|
33 |
+
Based on these results, ExaOne3.5 appears to provide a more detailed, policy-focused response with slightly more complex language and a minor conservative lean. Granite3.2 offers a more balanced, general overview with simpler language and a slightly more positive tone.
|
34 |
+
|
35 |
+
The differences suggest that while both models are discussing the same subject, they frame and emphasize different aspects of Trump's political views, with moderate overall similarity in content but different stylistic approaches.
|
36 |
|
37 |
|
38 |
|
|
|
102 |
Sentiment: Positive
|
103 |
Complexity: Average
|
104 |
Classification Comparison
|
105 |
+
Complexity: Model 1 uses complex language, while Model 2 uses average language
|
106 |
+
|
107 |
+
RoBERTa Sentiment Analysis Results
|
108 |
+
Sentiment Analysis Results
|
109 |
+
Granite3.2 is more positive than ExaOne3.5
|
110 |
+
|
111 |
+
ExaOne3.5
|
112 |
+
Sentiment: neutral (Score: -0.12)
|
113 |
+
|
114 |
+
Granite3.2
|
115 |
+
Sentiment: neutral (Score: 0.15)
|