tlkh commited on
Commit
eb12068
·
1 Parent(s): 5dd7df8

update app

Browse files
Files changed (1) hide show
  1. app.py +20 -3
app.py CHANGED
@@ -5,13 +5,19 @@ st.set_page_config(layout="wide")
5
 
6
  with st.sidebar.expander("📍 Explanation", expanded=False):
7
  st.markdown("""
 
 
8
  This demo allows you to explore the data inside the [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) dataset.
9
  It illustrates how **Word Position Deviation (WPD)** and **Lexical Deviation (LD)** can be used to find different types of [paraphrase pairs](https://direct.mit.edu/coli/article/39/3/463/1434/What-Is-a-Paraphrase) inside MRPC.
 
 
 
 
 
10
 
11
- By using what we observe from the data, we can also correct numerous labelling errors inside MRPC, presenting the a revision of MRPC termed as **MRPC-R1**.
12
  By changing the **Display Types** option below, you can filter the displayed pairs to show pairs that were rejected (label changed from paraphrase to non-paraphrase) or corrected (inconsistencies corrected).
13
 
14
- This demo accompanies the paper ["Towards Better Characterization of Paraphrases" (ACL 2022)](https://github.com/tlkh/paraphrase-metrics), which describes in detail the methodologies used.""")
15
 
16
  with st.sidebar.expander("⚙️ Dataset Options", expanded=False):
17
  st.markdown("This allows you to switch between the MRPC train and test sets, as well as choose to display only the original paraphrase pairs (MRPC) and/or the corrected pairs (MRPC-R1).")
@@ -24,7 +30,7 @@ ptype = st.sidebar.radio("Display Types", ["All",
24
  "Rejected Paraphrases from MRPC",
25
  "Corrected Paraphrases from MRPC"])
26
 
27
- st.sidebar.markdown("**Score Filter Options**")
28
  filter_by = st.sidebar.selectbox("Filter By Scores From", ["MRPC", "MRPC-R1"])
29
  display_range_wpd = st.sidebar.slider(
30
  "Filter by WPD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.7))
@@ -32,6 +38,16 @@ display_range_ld = st.sidebar.slider(
32
  "Filter by LD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.4))
33
  display_scores = st.sidebar.checkbox("Display scores", value=False)
34
 
 
 
 
 
 
 
 
 
 
 
35
  def load_df(split):
36
  if split == "train":
37
  df = pd.read_csv("./mrpc_train_scores.csv")
@@ -101,6 +117,7 @@ def filter_df(df, display, ptype, filter_by, display_scores):
101
  df = load_df(split)
102
 
103
  df_sel = filter_df(df, display, ptype, filter_by, display_scores)
 
104
 
105
  st.markdown("**MRPC Paraphrase Data Explorer** (Displaying "+str(len(df_sel))+" items)")
106
 
 
5
 
6
  with st.sidebar.expander("📍 Explanation", expanded=False):
7
  st.markdown("""
8
+ **About**
9
+
10
  This demo allows you to explore the data inside the [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) dataset.
11
  It illustrates how **Word Position Deviation (WPD)** and **Lexical Deviation (LD)** can be used to find different types of [paraphrase pairs](https://direct.mit.edu/coli/article/39/3/463/1434/What-Is-a-Paraphrase) inside MRPC.
12
+ By using what we observe from the data, we can find and correct numerous labelling errors inside MRPC, thus we present a revision of MRPC termed as **MRPC-R1**.
13
+
14
+ **Data Display**
15
+
16
+ The paraphrase pairs are displayed as **S1** and **S2** from the original MRPC (columns 1,2) and MRPC-R1 (columns 3,4), along with their labels (columns 5), showing if the label was changed or kept. **1->0** means that the pair was labelled as a paraphrase in MRPC, but corrected to non-paraphrase in MRPC-R1, meaning we rejected the paraphrase.
17
 
 
18
  By changing the **Display Types** option below, you can filter the displayed pairs to show pairs that were rejected (label changed from paraphrase to non-paraphrase) or corrected (inconsistencies corrected).
19
 
20
+ This demo accompanies the paper ["Towards Better Characterization of Paraphrases" (ACL 2022)](https://openreview.net/forum?id=t2UJIFZVyz4), which describes in detail the methodologies used.""")
21
 
22
  with st.sidebar.expander("⚙️ Dataset Options", expanded=False):
23
  st.markdown("This allows you to switch between the MRPC train and test sets, as well as choose to display only the original paraphrase pairs (MRPC) and/or the corrected pairs (MRPC-R1).")
 
30
  "Rejected Paraphrases from MRPC",
31
  "Corrected Paraphrases from MRPC"])
32
 
33
+ st.sidebar.markdown("**WPD/LD Score Filter Options**")
34
  filter_by = st.sidebar.selectbox("Filter By Scores From", ["MRPC", "MRPC-R1"])
35
  display_range_wpd = st.sidebar.slider(
36
  "Filter by WPD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.7))
 
38
  "Filter by LD Scores", min_value=0.0, max_value=1.0, value=(0.1, 0.4))
39
  display_scores = st.sidebar.checkbox("Display scores", value=False)
40
 
41
+ with st.sidebar.expander("📍 WPD/LD Score Explanation", expanded=False):
42
+ st.markdown("""
43
+ WPD and LD measure differences in the two sentences of a paraphrase pair:
44
+
45
+ * WPD measures difference in the sentence structure
46
+ * LD measures differences in the words used
47
+
48
+ By setting WPD to a high range (>0.4) and LD to a low range (e.g. <0.1), we can find paraphrases that do not change much in words used but have very different structures.
49
+ """)
50
+
51
  def load_df(split):
52
  if split == "train":
53
  df = pd.read_csv("./mrpc_train_scores.csv")
 
117
  df = load_df(split)
118
 
119
  df_sel = filter_df(df, display, ptype, filter_by, display_scores)
120
+ df_sel.rename(columns={"og_s1": "Original S1 (MRPC)", "og_s2": "Original S2 (MRPC)", "new_s1": "New S1 (MRPC-R1)", "new_s2": "New S2 (MRPC-R1)"}, inplace=True)
121
 
122
  st.markdown("**MRPC Paraphrase Data Explorer** (Displaying "+str(len(df_sel))+" items)")
123