Spaces:
Runtime error
Runtime error
update app
Browse files
app.py
CHANGED
@@ -5,13 +5,19 @@ st.set_page_config(layout="wide")
|
|
5 |
|
6 |
with st.sidebar.expander("📍 Explanation", expanded=False):
|
7 |
st.markdown("""
|
|
|
|
|
8 |
This demo allows you to explore the data inside the [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) dataset.
|
9 |
It illustrates how **Word Position Deviation (WPD)** and **Lexical Deviation (LD)** can be used to find different types of [paraphrase pairs](https://direct.mit.edu/coli/article/39/3/463/1434/What-Is-a-Paraphrase) inside MRPC.
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
By using what we observe from the data, we can also correct numerous labelling errors inside MRPC, presenting the a revision of MRPC termed as **MRPC-R1**.
|
12 |
By changing the **Display Types** option below, you can filter the displayed pairs to show pairs that were rejected (label changed from paraphrase to non-paraphrase) or corrected (inconsistencies corrected).
|
13 |
|
14 |
-
This demo accompanies the paper ["Towards Better Characterization of Paraphrases" (ACL 2022)](https://
|
15 |
|
16 |
with st.sidebar.expander("⚙️ Dataset Options", expanded=False):
|
17 |
st.markdown("This allows you to switch between the MRPC train and test sets, as well as choose to display only the original paraphrase pairs (MRPC) and/or the corrected pairs (MRPC-R1).")
|
@@ -24,7 +30,7 @@ ptype = st.sidebar.radio("Display Types", ["All",
|
|
24 |
"Rejected Paraphrases from MRPC",
|
25 |
"Corrected Paraphrases from MRPC"])
|
26 |
|
27 |
-
st.sidebar.markdown("**Score Filter Options**")
|
28 |
filter_by = st.sidebar.selectbox("Filter By Scores From", ["MRPC", "MRPC-R1"])
|
29 |
display_range_wpd = st.sidebar.slider(
|
30 |
"Filter by WPD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.7))
|
@@ -32,6 +38,16 @@ display_range_ld = st.sidebar.slider(
|
|
32 |
"Filter by LD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.4))
|
33 |
display_scores = st.sidebar.checkbox("Display scores", value=False)
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
def load_df(split):
|
36 |
if split == "train":
|
37 |
df = pd.read_csv("./mrpc_train_scores.csv")
|
@@ -101,6 +117,7 @@ def filter_df(df, display, ptype, filter_by, display_scores):
|
|
101 |
df = load_df(split)
|
102 |
|
103 |
df_sel = filter_df(df, display, ptype, filter_by, display_scores)
|
|
|
104 |
|
105 |
st.markdown("**MRPC Paraphrase Data Explorer** (Displaying "+str(len(df_sel))+" items)")
|
106 |
|
|
|
5 |
|
6 |
with st.sidebar.expander("📍 Explanation", expanded=False):
|
7 |
st.markdown("""
|
8 |
+
**About**
|
9 |
+
|
10 |
This demo allows you to explore the data inside the [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) dataset.
|
11 |
It illustrates how **Word Position Deviation (WPD)** and **Lexical Deviation (LD)** can be used to find different types of [paraphrase pairs](https://direct.mit.edu/coli/article/39/3/463/1434/What-Is-a-Paraphrase) inside MRPC.
|
12 |
+
By using what we observe from the data, we can find and correct numerous labelling errors inside MRPC, thus we present a revision of MRPC termed as **MRPC-R1**.
|
13 |
+
|
14 |
+
**Data Display**
|
15 |
+
|
16 |
+
The paraphrase pairs are displayed as **S1** and **S2** from the original MRPC (columns 1,2) and MRPC-R1 (columns 3,4), along with their labels (columns 5), showing if the label was changed or kept. **1->0** means that the pair was labelled as a paraphrase in MRPC, but corrected to non-paraphrase in MRPC-R1, meaning we rejected the paraphrase.
|
17 |
|
|
|
18 |
By changing the **Display Types** option below, you can filter the displayed pairs to show pairs that were rejected (label changed from paraphrase to non-paraphrase) or corrected (inconsistencies corrected).
|
19 |
|
20 |
+
This demo accompanies the paper ["Towards Better Characterization of Paraphrases" (ACL 2022)](https://openreview.net/forum?id=t2UJIFZVyz4), which describes in detail the methodologies used.""")
|
21 |
|
22 |
with st.sidebar.expander("⚙️ Dataset Options", expanded=False):
|
23 |
st.markdown("This allows you to switch between the MRPC train and test sets, as well as choose to display only the original paraphrase pairs (MRPC) and/or the corrected pairs (MRPC-R1).")
|
|
|
30 |
"Rejected Paraphrases from MRPC",
|
31 |
"Corrected Paraphrases from MRPC"])
|
32 |
|
33 |
+
st.sidebar.markdown("**WPD/LD Score Filter Options**")
|
34 |
filter_by = st.sidebar.selectbox("Filter By Scores From", ["MRPC", "MRPC-R1"])
|
35 |
display_range_wpd = st.sidebar.slider(
|
36 |
"Filter by WPD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.7))
|
|
|
38 |
"Filter by LD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.4))
|
39 |
display_scores = st.sidebar.checkbox("Display scores", value=False)
|
40 |
|
41 |
+
with st.sidebar.expander("📍 WPD/LD Score Explanation", expanded=False):
|
42 |
+
st.markdown("""
|
43 |
+
WPD and LD measure differences in the two sentences of a paraphrase pair:
|
44 |
+
|
45 |
+
* WPD measures difference in the sentence structure
|
46 |
+
* LD measures differences in the words used
|
47 |
+
|
48 |
+
By setting WPD to a high range (>0.4) and LD to a low range (e.g. <0.1), we can find paraphrases that do not change much in words used but have very different structures.
|
49 |
+
""")
|
50 |
+
|
51 |
def load_df(split):
|
52 |
if split == "train":
|
53 |
df = pd.read_csv("./mrpc_train_scores.csv")
|
|
|
117 |
df = load_df(split)
|
118 |
|
119 |
df_sel = filter_df(df, display, ptype, filter_by, display_scores)
|
120 |
+
df_sel.rename(columns={"og_s1": "Original S1 (MRPC)", "og_s2": "Original S2 (MRPC)", "new_s1": "New S1 (MRPC-R1)", "new_s2": "New S2 (MRPC-R1)"}, inplace=True)
|
121 |
|
122 |
st.markdown("**MRPC Paraphrase Data Explorer** (Displaying "+str(len(df_sel))+" items)")
|
123 |
|