Spaces:
Sleeping
Sleeping
Palbha Kulkarni (Nazwale)
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1 +1,111 @@
|
|
1 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Airline FAQ RAG Project
|
2 |
+
|
3 |
+
Welcome to the **Airline FAQ RAG Project**! This repository houses an innovative pet project exploring the creation of airline-related FAQ data using an AI agent and building a Retrieval-Augmented Generation (RAG) application to provide accurate and context-aware responses. Whether you're an AI enthusiast, a developer, or someone curious about natural language processing (NLP) and information retrieval, this project offers insights into data generation, embedding strategies, and RAG system optimization.
|
4 |
+
|
5 |
+
## π― Project Overview
|
6 |
+
|
7 |
+
This project has two primary components:
|
8 |
+
|
9 |
+
1. **FAQ Data Generation Agent**: An AI-driven agent that generates high-quality, airline-related FAQ data based on predefined topics, exploring the impact of prompt engineering on output quality.
|
10 |
+
2. **RAG Application**: A Retrieval-Augmented Generation system that leverages the generated FAQ data to answer queries, with experiments on different chunking strategies to optimize retrieval performance.
|
11 |
+
|
12 |
+
The goal is to build a foundation for an end-to-end conversational AI agent for airline customer support, starting with a robust RAG system.
|
13 |
+
|
14 |
+
## π Features
|
15 |
+
|
16 |
+
- **Dynamic FAQ Generation**: Automatically creates comprehensive airline FAQs covering topics like booking, baggage, cancellations, and in-flight services.
|
17 |
+
- **RAG Implementation**: Combines retrieval and generation to provide accurate, context-aware answers to user queries.
|
18 |
+
- **Chunking Experiments**: Evaluates multiple chunking strategies (e.g., full FAQ file vs. question-answer pair chunking) to optimize embedding and retrieval performance.
|
19 |
+
- **Prompt Engineering Insights**: Explores how different prompts affect the quality and relevance of generated FAQ data.
|
20 |
+
- **Scalable Design**: Lays the groundwork for extending the system into a fully autonomous airline support agent.
|
21 |
+
|
22 |
+
## π Repository Structure
|
23 |
+
|
24 |
+
- `data_generation/`: Scripts for generating airline FAQ data using an AI agent.
|
25 |
+
- `rag_application/`: Implementation of the RAG system, including embedding creation and retrieval logic.
|
26 |
+
- `data/`: Sample FAQ datasets and embeddings.
|
27 |
+
- `experiments/`: Notebooks and scripts comparing chunking strategies and their performance.
|
28 |
+
- `docs/`: Additional documentation and analysis of findings.
|
29 |
+
|
30 |
+
## π οΈ Getting Started
|
31 |
+
|
32 |
+
### Prerequisites
|
33 |
+
- Python 3.8+
|
34 |
+
- Libraries: `transformers`, `faiss`, `numpy`, `pandas`, `langchain` (or your preferred RAG framework)
|
35 |
+
- Optional: GPU for faster embedding generation
|
36 |
+
|
37 |
+
### Installation
|
38 |
+
1. Clone the repository:
|
39 |
+
```bash
|
40 |
+
git clone https://github.com/your-username/airline-faq-rag.git
|
41 |
+
cd airline-faq-rag
|
42 |
+
```
|
43 |
+
2. Install dependencies:
|
44 |
+
```bash
|
45 |
+
pip install -r requirements.txt
|
46 |
+
```
|
47 |
+
3. Generate FAQ data:
|
48 |
+
```bash
|
49 |
+
python data_generation/generate_faq.py --topics "booking,baggage,cancellations"
|
50 |
+
```
|
51 |
+
4. Run the RAG application:
|
52 |
+
```bash
|
53 |
+
python rag_application/run_rag.py
|
54 |
+
```
|
55 |
+
|
56 |
+
## π‘ Key Components
|
57 |
+
|
58 |
+
### 1. FAQ Data Generation
|
59 |
+
The FAQ generation agent creates airline-related question-answer pairs based on user-specified topics (e.g., booking, baggage, cancellations). Key features:
|
60 |
+
- **Prompt Engineering**: Experimented with various prompts to control tone, detail, and accuracy. For example:
|
61 |
+
- Prompt 1: "Generate 10 FAQs about airline baggage policies in a formal tone."
|
62 |
+
- Prompt 2: "Create concise FAQs for airline cancellations with customer-friendly language."
|
63 |
+
- **Learnings**:
|
64 |
+
- Specific prompts with clear instructions (e.g., "include examples") yield more relevant and detailed FAQs.
|
65 |
+
- Iterative prompt refinement improves output consistency and reduces hallucination.
|
66 |
+
- Adding context (e.g., airline-specific policies) enhances realism but requires careful prompt design to avoid bias.
|
67 |
+
|
68 |
+
### 2. RAG Application
|
69 |
+
The RAG system retrieves relevant FAQ answers for user queries using embeddings and generates responses. Key experiments:
|
70 |
+
- **Embedding Strategies**:
|
71 |
+
- **Full FAQ File Embedding**: Treated the entire FAQ dataset as a single document, creating one embedding per file.
|
72 |
+
- **Question-Answer Pair Chunking**: Split FAQs into individual question-answer pairs, creating embeddings for each pair.
|
73 |
+
- **Custom Chunking**: Grouped related FAQs (e.g., all baggage-related questions) into chunks to balance context and granularity.
|
74 |
+
- **Performance Evaluation**:
|
75 |
+
- **Full FAQ Embedding**: Fast but less precise, as it struggles with fine-grained retrieval for specific questions.
|
76 |
+
- **Question-Answer Pair Chunking**: Best performance for precise queries (e.g., "What is the baggage allowance?"), with higher relevance scores in retrieval.
|
77 |
+
- **Custom Chunking**: Improved context for complex queries but increased retrieval latency.
|
78 |
+
- **Findings**: Question-answer pair chunking outperformed other methods in precision and recall, making it the preferred approach for this use case.
|
79 |
+
|
80 |
+
## π Key Learnings
|
81 |
+
- **Prompt Sensitivity**: Small changes in prompt wording significantly affect FAQ quality. For example, specifying "customer-friendly" vs. "formal" tones altered the output's readability and tone.
|
82 |
+
- **Chunking Matters**: Fine-grained chunking (question-answer pairs) improves retrieval accuracy but requires careful indexing to manage scale.
|
83 |
+
- **Embedding Trade-offs**: Dense embeddings (e.g., using BERT-based models) offer better semantic understanding but are computationally expensive compared to sparse methods.
|
84 |
+
- **Scalability Challenges**: Large FAQ datasets require efficient indexing (e.g., FAISS) to maintain low-latency retrieval.
|
85 |
+
|
86 |
+
## π Next Steps
|
87 |
+
To evolve this project into an end-to-end airline support agent:
|
88 |
+
1. **Context-Aware Generation**: Integrate user context (e.g., booking details) into the RAG pipeline for personalized responses.
|
89 |
+
2. **Multi-Turn Conversations**: Enhance the agent to handle follow-up questions and maintain conversation history.
|
90 |
+
3. **Real-Time Data Integration**: Incorporate live airline data (e.g., flight status APIs) to provide dynamic answers.
|
91 |
+
4. **Model Fine-Tuning**: Fine-tune the language model on airline-specific data to improve response accuracy and domain knowledge.
|
92 |
+
5. **Evaluation Metrics**: Implement automated evaluation (e.g., BLEU, ROUGE, or human-in-the-loop feedback) to quantify response quality.
|
93 |
+
6. **Deployment**: Package the RAG system as a web or mobile app for real-world testing.
|
94 |
+
|
95 |
+
## π€ Contributing
|
96 |
+
We welcome contributions! Feel free to:
|
97 |
+
- Submit pull requests for new features or bug fixes.
|
98 |
+
- Share ideas for improving FAQ generation or RAG performance.
|
99 |
+
- Add new chunking strategies or prompt templates to the experiments folder.
|
100 |
+
|
101 |
+
Please read `CONTRIBUTING.md` for guidelines.
|
102 |
+
|
103 |
+
## π License
|
104 |
+
This project is licensed under the MIT License. See `LICENSE` for details.
|
105 |
+
|
106 |
+
## π¬ Contact
|
107 |
+
For questions or feedback, reach out via GitHub Issues or email at [[email protected]].
|
108 |
+
|
109 |
+
---
|
110 |
+
|
111 |
+
βοΈ *Let's build the future of airline customer support together!*
|