Joe Armani commited on
Commit
e0a307c
·
1 Parent(s): 9b268d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -36
README.md CHANGED
@@ -1,43 +1,18 @@
1
- # Retrieval-based learning chatbot
2
 
3
- CSC525 - Module 8 Option 2 - Retrieval-based Learning Chatbot - Joseph Armani
4
 
5
- ## TODO
6
-
7
- A Python tool to generate high-quality dialog variations.
8
-
9
- This package automatically downloads the following models during installation:
10
-
11
- - Universal Sentence Encoder v4 (TensorFlow Hub)
12
- - ChatGPT Paraphraser T5-base
13
- - Helsinki-NLP translation models (en-de, de-es, es-en)
14
- - GPT-2 (for perplexity scoring)
15
- - spaCy en_core_web_sm
16
- - nltk wordnet and averaged_perceptron_tagger_eng models
17
-
18
- ## Install package
19
 
20
- pip install -e .
21
 
22
- ## Description
23
 
24
- This Python script demonstrates a complete pipeline for dialogue augmentation, including validation, optimization, and data augmentation.
25
- It creates high-quality augmented versions of dialogues by applying various text augmentation techniques and quality control checks.
26
- Two approaches are used for text augmentation: paraphrasing and back-translation. The pipeline also includes quality metrics for evaluating the augmented text.
27
- Special handling is implemented for very short text such as greetings and farewells, which are predefined and filtered for quality.
28
- The pipeline is designed to process a dataset of dialogues and generate multiple high-quality augmented versions of each dialogue.
29
- The pipeline ensures duplicate dialogues are not generated and that the output meets quality thresholds for semantic similarity, grammar, fluency, diversity, and content preservation.
30
 
31
- ## References
32
 
33
- Accsany, P. (2024). Working with JSON data in Python. Real Python. <https://realpython.com/python-json/>
34
- Explosion AI Team. (n.d.). Spacy · industrial-strength natural language processing in python. <https://spacy.io/>
35
- GeeksforGeeks. (2024). Text augmentation techniques in NLP. GeeksforGeeks. <https://www.geeksforgeeks.org/text-augmentation-techniques-in-nlp/>
36
- Helsinki-NLP. (2024). Opus-MT [Computer software]. GitHub. <https://github.com/Helsinki-NLP/Opus-MT>
37
- Hugging Face. (n.d.). Transformers. Hugging Face. <https://huggingface.co/docs/transformers/en/index>
38
- Humarin. (2023). ChatGPT paraphraser on T5-base [Computer software]. Hugging Face. <https://huggingface.co/humarin/chatgpt_paraphraser_on_T5_base>
39
- Keita, Z. (2022). Data augmentation in NLP using back-translation with MarianMT. Towards Data Science. <https://towardsdatascience.com/data-augmentation-in-nlp-using-back-translation-with-marianmt-a8939dfea50a>
40
- Memgraph. (2023). Cosine similarity in Python with scikit-learn. Memgraph. <https://memgraph.com/blog/cosine-similarity-python-scikit-learn>
41
- Morris, J. (n.d.). language-tool-python (Version 2.8.1) [Computer software]. PyPI. <https://pypi.org/project/language-tool-python/>
42
- TensorFlow. (n.d.). Universal sentence encoder. TensorFlow Hub. <https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder>
43
- Waheed, A. (2023). How to calculate ROUGE score in Python. Python Code. <https://thepythoncode.com/article/calculate-rouge-score-in-python>
 
1
+ # CSC525 Retrieval Chatbot
2
 
3
+ This is a retrieval-based chatbot using Sentence Transformers and FAISS for efficient similarity search.
4
 
5
+ ## Description
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ The chatbot uses a pre-trained Sentence Transformer model to encode queries and a FAISS index to retrieve relevant responses from a curated response pool (Taskmaster-1 dataset)
8
 
9
+ ## Usage
10
 
11
+ Simply type your question in the chat interface and the bot will retrieve the most relevant response from its knowledge base.
12
+ Features
 
 
 
 
13
 
14
+ ## Semantic search using Sentence Transformers
15
 
16
+ Efficient retrieval using FAISS indexing
17
+ Context-aware responses
18
+ Quality checking of responses