Joe Armani
commited on
Commit
·
e0a307c
1
Parent(s):
9b268d0
Update README.md
Browse files
README.md
CHANGED
@@ -1,43 +1,18 @@
|
|
1 |
-
# Retrieval
|
2 |
|
3 |
-
|
4 |
|
5 |
-
##
|
6 |
-
|
7 |
-
A Python tool to generate high-quality dialog variations.
|
8 |
-
|
9 |
-
This package automatically downloads the following models during installation:
|
10 |
-
|
11 |
-
- Universal Sentence Encoder v4 (TensorFlow Hub)
|
12 |
-
- ChatGPT Paraphraser T5-base
|
13 |
-
- Helsinki-NLP translation models (en-de, de-es, es-en)
|
14 |
-
- GPT-2 (for perplexity scoring)
|
15 |
-
- spaCy en_core_web_sm
|
16 |
-
- nltk wordnet and averaged_perceptron_tagger_eng models
|
17 |
-
|
18 |
-
## Install package
|
19 |
|
20 |
-
|
21 |
|
22 |
-
##
|
23 |
|
24 |
-
|
25 |
-
|
26 |
-
Two approaches are used for text augmentation: paraphrasing and back-translation. The pipeline also includes quality metrics for evaluating the augmented text.
|
27 |
-
Special handling is implemented for very short text such as greetings and farewells, which are predefined and filtered for quality.
|
28 |
-
The pipeline is designed to process a dataset of dialogues and generate multiple high-quality augmented versions of each dialogue.
|
29 |
-
The pipeline ensures duplicate dialogues are not generated and that the output meets quality thresholds for semantic similarity, grammar, fluency, diversity, and content preservation.
|
30 |
|
31 |
-
##
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
Helsinki-NLP. (2024). Opus-MT [Computer software]. GitHub. <https://github.com/Helsinki-NLP/Opus-MT>
|
37 |
-
Hugging Face. (n.d.). Transformers. Hugging Face. <https://huggingface.co/docs/transformers/en/index>
|
38 |
-
Humarin. (2023). ChatGPT paraphraser on T5-base [Computer software]. Hugging Face. <https://huggingface.co/humarin/chatgpt_paraphraser_on_T5_base>
|
39 |
-
Keita, Z. (2022). Data augmentation in NLP using back-translation with MarianMT. Towards Data Science. <https://towardsdatascience.com/data-augmentation-in-nlp-using-back-translation-with-marianmt-a8939dfea50a>
|
40 |
-
Memgraph. (2023). Cosine similarity in Python with scikit-learn. Memgraph. <https://memgraph.com/blog/cosine-similarity-python-scikit-learn>
|
41 |
-
Morris, J. (n.d.). language-tool-python (Version 2.8.1) [Computer software]. PyPI. <https://pypi.org/project/language-tool-python/>
|
42 |
-
TensorFlow. (n.d.). Universal sentence encoder. TensorFlow Hub. <https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder>
|
43 |
-
Waheed, A. (2023). How to calculate ROUGE score in Python. Python Code. <https://thepythoncode.com/article/calculate-rouge-score-in-python>
|
|
|
1 |
+
# CSC525 Retrieval Chatbot
|
2 |
|
3 |
+
This is a retrieval-based chatbot using Sentence Transformers and FAISS for efficient similarity search.
|
4 |
|
5 |
+
## Description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
+
The chatbot uses a pre-trained Sentence Transformer model to encode queries and a FAISS index to retrieve relevant responses from a curated response pool (Taskmaster-1 dataset)
|
8 |
|
9 |
+
## Usage
|
10 |
|
11 |
+
Simply type your question in the chat interface and the bot will retrieve the most relevant response from its knowledge base.
|
12 |
+
Features
|
|
|
|
|
|
|
|
|
13 |
|
14 |
+
## Semantic search using Sentence Transformers
|
15 |
|
16 |
+
Efficient retrieval using FAISS indexing
|
17 |
+
Context-aware responses
|
18 |
+
Quality checking of responses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|