Spaces:
Sleeping
Sleeping
Ryan
commited on
Commit
·
ac85b4b
1
Parent(s):
953b2c8
update
Browse files
README.md
CHANGED
@@ -26,17 +26,30 @@ This is a Gradio app that allows you to compare the responses of two different L
|
|
26 |
There are three main tabs:
|
27 |
|
28 |
- Dataset Input
|
29 |
-
- Analysis
|
30 |
-
- RoBERTa Sentiment
|
31 |
- Summary
|
|
|
32 |
|
|
|
33 |
|
|
|
|
|
|
|
34 |
|
35 |
# Usage
|
36 |
|
37 |
## Dataset Input
|
38 |
|
39 |
-
The dataset input tab allows you to select a dataset from the built-in dataset or enter your own prompt and responses. You can select a dataset from the dropdown menu, or enter your own prompt and responses in the text boxes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## Analysis
|
42 |
|
@@ -47,8 +60,18 @@ Once you have loaded a dataset, you now have four options:
|
|
47 |
- Bias Detection
|
48 |
- Classifier
|
49 |
|
|
|
|
|
50 |
### Bag of Words
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
|
54 |
### N-grams
|
|
|
26 |
There are three main tabs:
|
27 |
|
28 |
- Dataset Input
|
29 |
+
- Analysis (four options, including a classifier that is NLTK VADER based)
|
30 |
+
- RoBERTa Sentiment (Transformer based classifier)
|
31 |
- Summary
|
32 |
+
- Visuals (this is a basic graphing display that is hard-coded, not dynamic based on the datasets)
|
33 |
|
34 |
+
I will quote the assignment prompt and make a few comments:
|
35 |
|
36 |
+
"You might think about it as a mini hackathon over the course of several weeks instead of a single day, aiming to build an engaging proof-of-concept demo of an idea you have – you are not expected to have a fully, production-ready application for this assignment."
|
37 |
+
|
38 |
+
The app is fairly stable and has a decent amount of features, although it is not a fully production-ready application. I believe with more time I could refine this, but a good amount of time was put into it to get it to an acceptable state for the assignment requirements. One of the things is I would recommend restarting the space if you want to load another dataset and have the results built through the analysis options and displayed in the summary tab for you to send to an LLM if you want. The visuals tab is more of an example of some hard-coded results, with the n-grams option of the anlysis tab having the dynamic graphing. That option will display graphs based on the dataset, including a created dataset. I could have added more graphs to other options, including the RoBERTa classifier, but felt the current state is of acceptable features.
|
39 |
|
40 |
# Usage
|
41 |
|
42 |
## Dataset Input
|
43 |
|
44 |
+
The dataset input tab allows you to select a dataset from the built-in dataset or enter your own prompt and responses. You can select a dataset from the dropdown menu, or enter your own prompt and responses in the text boxes. The load dataset button will fill in the text boxes with the selected dataset. With that or your own dataset, you then click the create dataset button and a message will display below that stating if it was successfully created. Now click on the Analysis tab.
|
45 |
+
|
46 |
+
The built-in datasets are:
|
47 |
+
- person-harris.txt: Responses about Kamala Harris
|
48 |
+
- person-trump.txt: Responses about Donald Trump
|
49 |
+
- topic-foreign_policy.txt: Responses about foreign policy views
|
50 |
+
- topic-the_economy.txt: Responses about economic views
|
51 |
+
|
52 |
+
There are two responses each, one from LG's ExaOne 3.5, and the other from IBM's Granite 3.2. These are both approximately 5gb models that are Ollama compatible, which is how the results were obtained.
|
53 |
|
54 |
## Analysis
|
55 |
|
|
|
60 |
- Bias Detection
|
61 |
- Classifier
|
62 |
|
63 |
+
The N-grams option will produce a dynamic graph. Also, these options will product a text file with the results that can be accessed in the summary tab. If you go through all four of these analysis options it will append them to the file.
|
64 |
+
|
65 |
### Bag of Words
|
66 |
|
67 |
+
Bag of Words here is fairly basic. There are no parameter options. Click run analysis and you will see some comparison results.
|
68 |
+
|
69 |
+
Similarity Metrics Terms:
|
70 |
+
|
71 |
+
- Cosine Similarity: Measures the cosine of the angle between two non-zero vectors. A value of 1 means they are identical, while a value of 0 means they are orthogonal.
|
72 |
+
- Jaccard Similarity: Measures the similarity between two sets. A value of 1 means they are identical, while a value of 0 means they have no overlap.
|
73 |
+
- Semantic Similarity: Measures the similarity between two texts based on their meaning. A value of 1 means they are identical, while a value of 0 means they have no similarity.
|
74 |
+
- Common Words: The number of words that appear in both responses.
|
75 |
|
76 |
|
77 |
### N-grams
|
app.py
CHANGED
@@ -580,6 +580,105 @@ def create_app():
|
|
580 |
f"- **{category}**: {diff}"
|
581 |
for category, diff in differences.items()
|
582 |
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
583 |
|
584 |
# Check for Bias Detection analysis
|
585 |
elif selected_analysis == "Bias Detection" and "bias_detection" in analyses:
|
|
|
580 |
f"- **{category}**: {diff}"
|
581 |
for category, diff in differences.items()
|
582 |
])
|
583 |
+
|
584 |
+
# Create visualization using matplotlib
|
585 |
+
import matplotlib.pyplot as plt
|
586 |
+
import io
|
587 |
+
import base64
|
588 |
+
from PIL import Image
|
589 |
+
|
590 |
+
try:
|
591 |
+
# Define metrics and mappings
|
592 |
+
metrics = ['Formality', 'Sentiment', 'Complexity']
|
593 |
+
mapping = {
|
594 |
+
'Formality': {'Informal': 1, 'Neutral': 2, 'Formal': 3},
|
595 |
+
'Sentiment': {'Negative': 1, 'Neutral': 2, 'Positive': 3},
|
596 |
+
'Complexity': {'Simple': 1, 'Average': 2, 'Complex': 3}
|
597 |
+
}
|
598 |
+
|
599 |
+
# Get values for each model
|
600 |
+
model1_vals = []
|
601 |
+
model2_vals = []
|
602 |
+
|
603 |
+
# Get formality value for model1
|
604 |
+
formality1 = model1_results.get('formality', 'Neutral')
|
605 |
+
if formality1 in mapping['Formality']:
|
606 |
+
model1_vals.append(mapping['Formality'][formality1])
|
607 |
+
else:
|
608 |
+
model1_vals.append(2) # Default to neutral
|
609 |
+
|
610 |
+
# Get sentiment value for model1
|
611 |
+
sentiment1 = model1_results.get('sentiment', 'Neutral')
|
612 |
+
if sentiment1 in mapping['Sentiment']:
|
613 |
+
model1_vals.append(mapping['Sentiment'][sentiment1])
|
614 |
+
else:
|
615 |
+
model1_vals.append(2) # Default to neutral
|
616 |
+
|
617 |
+
# Get complexity value for model1
|
618 |
+
complexity1 = model1_results.get('complexity', 'Average')
|
619 |
+
if complexity1 in mapping['Complexity']:
|
620 |
+
model1_vals.append(mapping['Complexity'][complexity1])
|
621 |
+
else:
|
622 |
+
model1_vals.append(2) # Default to average
|
623 |
+
|
624 |
+
# Get formality value for model2
|
625 |
+
formality2 = model2_results.get('formality', 'Neutral')
|
626 |
+
if formality2 in mapping['Formality']:
|
627 |
+
model2_vals.append(mapping['Formality'][formality2])
|
628 |
+
else:
|
629 |
+
model2_vals.append(2) # Default to neutral
|
630 |
+
|
631 |
+
# Get sentiment value for model2
|
632 |
+
sentiment2 = model2_results.get('sentiment', 'Neutral')
|
633 |
+
if sentiment2 in mapping['Sentiment']:
|
634 |
+
model2_vals.append(mapping['Sentiment'][sentiment2])
|
635 |
+
else:
|
636 |
+
model2_vals.append(2) # Default to neutral
|
637 |
+
|
638 |
+
# Get complexity value for model2
|
639 |
+
complexity2 = model2_results.get('complexity', 'Average')
|
640 |
+
if complexity2 in mapping['Complexity']:
|
641 |
+
model2_vals.append(mapping['Complexity'][complexity2])
|
642 |
+
else:
|
643 |
+
model2_vals.append(2) # Default to average
|
644 |
+
|
645 |
+
# Plot grouped bar chart
|
646 |
+
plt.figure(figsize=(10, 6))
|
647 |
+
x = range(len(metrics))
|
648 |
+
width = 0.35
|
649 |
+
plt.bar([p - width/2 for p in x], model1_vals, width=width, label=model1_name)
|
650 |
+
plt.bar([p + width/2 for p in x], model2_vals, width=width, label=model2_name)
|
651 |
+
plt.xticks(x, metrics)
|
652 |
+
plt.yticks([1, 2, 3], ['Low', 'Medium', 'High'])
|
653 |
+
plt.ylim(0, 3.5)
|
654 |
+
plt.ylabel('Level')
|
655 |
+
plt.title('Comparison of Model Characteristics')
|
656 |
+
plt.legend()
|
657 |
+
plt.tight_layout()
|
658 |
+
|
659 |
+
# Save the plot to a bytes buffer
|
660 |
+
buf = io.BytesIO()
|
661 |
+
plt.savefig(buf, format='png', dpi=100)
|
662 |
+
buf.seek(0)
|
663 |
+
|
664 |
+
# Convert to PIL Image
|
665 |
+
viz_image = Image.open(buf)
|
666 |
+
|
667 |
+
# Convert the image to a base64 string for embedding
|
668 |
+
buffered = io.BytesIO()
|
669 |
+
viz_image.save(buffered, format="PNG")
|
670 |
+
img_str = base64.b64encode(buffered.getvalue()).decode()
|
671 |
+
|
672 |
+
# Append the image to the metrics_value
|
673 |
+
similarity_title_visible = True
|
674 |
+
similarity_metrics_visible = True
|
675 |
+
similarity_metrics_value = f"""
|
676 |
+
<div style="margin-top: 20px;">
|
677 |
+
<img src="data:image/png;base64,{img_str}" alt="Classifier visualization" style="max-width: 100%;">
|
678 |
+
</div>
|
679 |
+
"""
|
680 |
+
except Exception as viz_error:
|
681 |
+
print(f"Classifier visualization error: {viz_error}")
|
682 |
|
683 |
# Check for Bias Detection analysis
|
684 |
elif selected_analysis == "Bias Detection" and "bias_detection" in analyses:
|