krishnadhulipalla commited on
Commit
b11d28e
·
1 Parent(s): dbfd515

Upload 5 files

Browse files
Files changed (4) hide show
  1. all_chunks.json +482 -0
  2. app.py +385 -64
  3. faiss_store/v30_600_150/index.faiss +0 -0
  4. requirements.txt +0 -0
all_chunks.json ADDED
@@ -0,0 +1,482 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "text": "# 🧠 AI Notes and Ideas\n\n## Ideas for Personal Chatbot\n\n- Use LangChain agent with custom toolset: RAG retriever, calculator, search wrapper\n- Implement system memory using Redis or JSON-based long-term storage\n- Use NVIDIA's `mixtral-8x7b-instruct` via proxy in client_server.py\n- Break down documents into small markdown sections and enrich with metadata\n- Add feedback logging for failed queries or hallucinations\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla has ideas for building a personal chatbot using LangChain agent, system memory, and AI models. The notes outline technical details for implementing the chatbot's features.\n\n🔸 Related Questions:\n- What AI technologies is Krishna Vamsi Dhulipalla exploring for his personal chatbot project?\n- How does Krishna plan to implement memory and metadata enrichment for his chatbot?\n- What are some of the technical considerations Krishna is addressing in his chatbot development notes?",
4
+ "metadata": {
5
+ "source": "ai_notes.md",
6
+ "header": "# 🧠 AI Notes and Ideas",
7
+ "chunk_id": "ai_notes.md_#0_f86fcc2e",
8
+ "has_header": true,
9
+ "word_count": 62,
10
+ "summary": "Krishna Vamsi Dhulipalla has ideas for building a personal chatbot using LangChain agent, system memory, and AI models. The notes outline technical details for implementing the chatbot's features.",
11
+ "synthetic_queries": [
12
+ "What AI technologies is Krishna Vamsi Dhulipalla exploring for his personal chatbot project?",
13
+ "How does Krishna plan to implement memory and metadata enrichment for his chatbot?",
14
+ "What are some of the technical considerations Krishna is addressing in his chatbot development notes?"
15
+ ]
16
+ }
17
+ },
18
+ {
19
+ "text": "## Retrieval Strategy Notes\n\n- Combine vector + keyword retrieval (hybrid)\n- Chunk at paragraph-level with title + heading for anchors\n- Add personal tags: `goal`, `project`, `education`, `faq`, `qa`, `experience`, `task`\n- Leverage time metadata for recency-based prioritization\n\n## Model Setup\n\n- Embed with `bge-m3` (or `text-embedding-3-large`)\n- Route to OpenAI or NVIDIA NIMs based on availability\n- Multi-agent flow: retrieval → synthesis → validator (future plan)\n\n---\n🔹 Summary:\nThe document outlines a strategy for optimized document retrieval about Krishna Vamsi Dhulipalla, combining vector and keyword retrieval with paragraph-level chunking and personal tags. The model setup involves embedding with bge-m3 and routing to OpenAI or NVIDIA NIMs for processing.\n\n🔸 Related Questions:\n- What approach should be used for retrieving documents about Krishna Vamsi Dhulipalla's projects and goals?\n- How can a hybrid retrieval strategy be implemented for efficient document retrieval about Krishna's educational background?\n- What is the recommended model setup for embedding and routing documents related to Krishna Vamsi Dhulipalla's experience and tasks?",
20
+ "metadata": {
21
+ "source": "ai_notes.md",
22
+ "header": "# 🧠 AI Notes and Ideas",
23
+ "chunk_id": "ai_notes.md_#1_3fe4782e",
24
+ "has_header": true,
25
+ "word_count": 68,
26
+ "summary": "The document outlines a strategy for optimized document retrieval about Krishna Vamsi Dhulipalla, combining vector and keyword retrieval with paragraph-level chunking and personal tags. The model setup involves embedding with bge-m3 and routing to OpenAI or NVIDIA NIMs for processing.",
27
+ "synthetic_queries": [
28
+ "What approach should be used for retrieving documents about Krishna Vamsi Dhulipalla's projects and goals?",
29
+ "How can a hybrid retrieval strategy be implemented for efficient document retrieval about Krishna's educational background?",
30
+ "What is the recommended model setup for embedding and routing documents related to Krishna Vamsi Dhulipalla's experience and tasks?"
31
+ ]
32
+ }
33
+ },
34
+ {
35
+ "text": "## Agent Concept Examples\n\n- RetrievalAgent → fetches top documents from FAISS\n- ResponseSynthesizerAgent → synthesizes markdown summary\n- TaskPlannerAgent → returns structured plan or task list\n\n---\n🔹 Summary:\nThis document chunk describes different agent concepts, which are possibly used in a project or system related to Krishna Vamsi Dhulipalla, such as RetrievalAgent, ResponseSynthesizerAgent, and TaskPlannerAgent.\n\n🔸 Related Questions:\n- What are the different agents used in Krishna Vamsi Dhulipalla's project?\n- How does Krishna's system handle document retrieval and summarization?\n- What are the main components of Krishna Vamsi Dhulipalla's task planning system?",
36
+ "metadata": {
37
+ "source": "ai_notes.md",
38
+ "header": "# 🧠 AI Notes and Ideas",
39
+ "chunk_id": "ai_notes.md_#2_a8581af8",
40
+ "has_header": true,
41
+ "word_count": 27,
42
+ "summary": "This document chunk describes different agent concepts, which are possibly used in a project or system related to Krishna Vamsi Dhulipalla, such as RetrievalAgent, ResponseSynthesizerAgent, and TaskPlannerAgent.",
43
+ "synthetic_queries": [
44
+ "What are the different agents used in Krishna Vamsi Dhulipalla's project?",
45
+ "How does Krishna's system handle document retrieval and summarization?",
46
+ "What are the main components of Krishna Vamsi Dhulipalla's task planning system?"
47
+ ]
48
+ }
49
+ },
50
+ {
51
+ "text": "# 💬 Example Conversations for Personal Assistant Chatbot\n\n## Q: What interests you in data engineering?\n\nA: I’m passionate about architecting scalable data systems that drive actionable insights. From optimizing ETL workflows to deploying real-time pipelines with Kafka/Spark, I enjoy building user-centric products—like genomic data frameworks at Virginia Tech and analytics platforms at UJR Technologies.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla is passionate about architecting scalable data systems and building user-centric products in data engineering. He enjoys optimizing ETL workflows and deploying real-time pipelines.\n\n🔸 Related Questions:\n- What does Krishna Vamsi Dhulipalla enjoy most about data engineering?\n- What are Krishna's interests in building scalable data systems?\n- Can you describe Krishna's experience in data engineering and its applications?",
52
+ "metadata": {
53
+ "source": "conversations.md",
54
+ "header": "# 💬 Example Conversations for Personal Assistant Chatbot",
55
+ "chunk_id": "conversations.md_#0_e383095e",
56
+ "has_header": true,
57
+ "word_count": 55,
58
+ "summary": "Krishna Vamsi Dhulipalla is passionate about architecting scalable data systems and building user-centric products in data engineering. He enjoys optimizing ETL workflows and deploying real-time pipelines.",
59
+ "synthetic_queries": [
60
+ "What does Krishna Vamsi Dhulipalla enjoy most about data engineering?",
61
+ "What are Krishna's interests in building scalable data systems?",
62
+ "Can you describe Krishna's experience in data engineering and its applications?"
63
+ ]
64
+ }
65
+ },
66
+ {
67
+ "text": "## Q: Describe a pipeline you've built.\n\nA: I created a real-time IoT temperature pipeline at Virginia Tech using Kafka, AWS Glue, Airflow, and Snowflake. It processed 10,000+ sensor readings and fed into GPT-4 forecasts with 91% accuracy, helping reduce energy costs by 15% and improve stakeholder decision-making by 30%.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla built a real-time IoT temperature pipeline that processed over 10,000 sensor readings and achieved a 91% accuracy rate in forecasts, leading to significant energy cost savings and improved decision-making. He utilized technologies such as Kafka, AWS Glue, Airflow, and Snowflake in this pipeline.\n\n🔸 Related Questions:\n- What is an example of a successful data pipeline built by Krishna Vamsi Dhulipalla?\n- How has Krishna Vamsi Dhulipalla applied his skills in IoT and data processing to achieve business outcomes?\n- What technologies has Krishna Vamsi Dhulipalla used in his data pipeline projects?",
68
+ "metadata": {
69
+ "source": "conversations.md",
70
+ "header": "# 💬 Example Conversations for Personal Assistant Chatbot",
71
+ "chunk_id": "conversations.md_#1_98373bb5",
72
+ "has_header": true,
73
+ "word_count": 50,
74
+ "summary": "Krishna Vamsi Dhulipalla built a real-time IoT temperature pipeline that processed over 10,000 sensor readings and achieved a 91% accuracy rate in forecasts, leading to significant energy cost savings and improved decision-making. He utilized technologies such as Kafka, AWS Glue, Airflow, and Snowflake in this pipeline.",
75
+ "synthetic_queries": [
76
+ "What is an example of a successful data pipeline built by Krishna Vamsi Dhulipalla?",
77
+ "How has Krishna Vamsi Dhulipalla applied his skills in IoT and data processing to achieve business outcomes?",
78
+ "What technologies has Krishna Vamsi Dhulipalla used in his data pipeline projects?"
79
+ ]
80
+ }
81
+ },
82
+ {
83
+ "text": "## Q: What was your most challenging debugging experience?\n\nA: Resolving duplicate ingestion and latency issues in a Kafka/Spark pipeline at UJR Technologies. I traced misconfigurations across consumer groups, optimized Spark executor memory, and enforced idempotent logic—reducing latency by 30% and achieving 99.9% data accuracy.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla shares his most challenging debugging experience, detailing how he resolved issues in a Kafka/Spark pipeline at UJR Technologies. He optimized Spark executor memory, enforced idempotent logic, and reduced latency by 30% with 99.9% data accuracy.\n\n🔸 Related Questions:\n- What was Krishna Vamsi Dhulipalla's most challenging debugging experience?\n- How did Krishna improve data accuracy and reduce latency in a Kafka/Spark pipeline?\n- What were some of the technical challenges Krishna faced while working at UJR Technologies?",
84
+ "metadata": {
85
+ "source": "conversations.md",
86
+ "header": "# 💬 Example Conversations for Personal Assistant Chatbot",
87
+ "chunk_id": "conversations.md_#2_b16dcaf3",
88
+ "has_header": true,
89
+ "word_count": 45,
90
+ "summary": "Krishna Vamsi Dhulipalla shares his most challenging debugging experience, detailing how he resolved issues in a Kafka/Spark pipeline at UJR Technologies. He optimized Spark executor memory, enforced idempotent logic, and reduced latency by 30% with 99.9% data accuracy.",
91
+ "synthetic_queries": [
92
+ "What was Krishna Vamsi Dhulipalla's most challenging debugging experience?",
93
+ "How did Krishna improve data accuracy and reduce latency in a Kafka/Spark pipeline?",
94
+ "What were some of the technical challenges Krishna faced while working at UJR Technologies?"
95
+ ]
96
+ }
97
+ },
98
+ {
99
+ "text": "## Q: Describe a collaboration experience.\n\nA: At Virginia Tech, I collaborated with engineers and scientists on cross-domain NER. I led ML model tuning while engineers handled EC2 deployment. We reduced latency by 30% and boosted F1-scores by 8%, enabling large-scale analysis across 10M+ records.\n\n## Q: How do you handle data cleaning?\n\nA: I usually check for missing values, duplicates, and outliers. I ensure schema consistency and apply transformations using Pandas or SQL. For large datasets, I use Airflow + dbt for efficient pipeline automation.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla has experience collaborating on projects, including a cross-domain NER task at Virginia Tech where he led ML model tuning, and has expertise in data cleaning using tools like Pandas, SQL, Airflow, and dbt. He has achieved significant improvements in performance, such as reducing latency by 30% and boosting F1-scores by 8%.\n\n🔸 Related Questions:\n- What collaboration experiences does Krishna have in his background?\n- How does Krishna approach data cleaning and preprocessing?\n- What tools and techniques does Krishna use for efficient data pipeline automation?",
100
+ "metadata": {
101
+ "source": "conversations.md",
102
+ "header": "# 💬 Example Conversations for Personal Assistant Chatbot",
103
+ "chunk_id": "conversations.md_#3_81f368df",
104
+ "has_header": true,
105
+ "word_count": 86,
106
+ "summary": "Krishna Vamsi Dhulipalla has experience collaborating on projects, including a cross-domain NER task at Virginia Tech where he led ML model tuning, and has expertise in data cleaning using tools like Pandas, SQL, Airflow, and dbt. He has achieved significant improvements in performance, such as reducing latency by 30% and boosting F1-scores by 8%.",
107
+ "synthetic_queries": [
108
+ "What collaboration experiences does Krishna have in his background?",
109
+ "How does Krishna approach data cleaning and preprocessing?",
110
+ "What tools and techniques does Krishna use for efficient data pipeline automation?"
111
+ ]
112
+ }
113
+ },
114
+ {
115
+ "text": "## Q: What's your biggest strength and weakness?\n\nA: Strength – Breaking down complex data into insights and delivering reliable systems. \nWeakness – Spending too long polishing outputs, though I’ve learned to balance quality and speed.\n\n## Q: What tools have you used recently?\n\nA: Python, Airflow, dbt, SageMaker, Kafka, Spark, and Snowflake. Recently, I’ve also used Docker, CloudWatch, and Looker for visualization and monitoring.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla's strengths lie in breaking down complex data into insights and delivering reliable systems, while his weakness is spending too long polishing outputs. He has recently worked with various tools including Python, Airflow, dbt, and data visualization platforms.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's technical strengths and weaknesses?\n- What tools and technologies does Krishna use in his data work?\n- Can you describe Krishna's data analysis and system delivery skills?",
116
+ "metadata": {
117
+ "source": "conversations.md",
118
+ "header": "# 💬 Example Conversations for Personal Assistant Chatbot",
119
+ "chunk_id": "conversations.md_#4_7d01c2cb",
120
+ "has_header": true,
121
+ "word_count": 65,
122
+ "summary": "Krishna Vamsi Dhulipalla's strengths lie in breaking down complex data into insights and delivering reliable systems, while his weakness is spending too long polishing outputs. He has recently worked with various tools including Python, Airflow, dbt, and data visualization platforms.",
123
+ "synthetic_queries": [
124
+ "What are Krishna Vamsi Dhulipalla's technical strengths and weaknesses?",
125
+ "What tools and technologies does Krishna use in his data work?",
126
+ "Can you describe Krishna's data analysis and system delivery skills?"
127
+ ]
128
+ }
129
+ },
130
+ {
131
+ "text": "## Q: What do you want to work on next?\n\nA: I want to work more in production ML or data infrastructure—especially on real-time systems and scalable platforms supporting cross-functional teams.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla is interested in working on production machine learning, data infrastructure, and real-time systems. He also wants to work on scalable platforms that support cross-functional teams.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's career goals in the field of machine learning?\n- What type of projects does Krishna Vamsi Dhulipalla want to work on in the future?\n- What are Krishna's interests in terms of data infrastructure and real-time systems?",
132
+ "metadata": {
133
+ "source": "conversations.md",
134
+ "header": "# 💬 Example Conversations for Personal Assistant Chatbot",
135
+ "chunk_id": "conversations.md_#5_6581e71e",
136
+ "has_header": true,
137
+ "word_count": 31,
138
+ "summary": "Krishna Vamsi Dhulipalla is interested in working on production machine learning, data infrastructure, and real-time systems. He also wants to work on scalable platforms that support cross-functional teams.",
139
+ "synthetic_queries": [
140
+ "What are Krishna Vamsi Dhulipalla's career goals in the field of machine learning?",
141
+ "What type of projects does Krishna Vamsi Dhulipalla want to work on in the future?",
142
+ "What are Krishna's interests in terms of data infrastructure and real-time systems?"
143
+ ]
144
+ }
145
+ },
146
+ {
147
+ "text": "# 🎯 Personal and Professional Goals\n\n## Short-Term Goals (0–6 months)\n\n- Deploy a personal AI chatbot with multi-agent architecture using RAG and open-source LLMs.\n- Publish second paper on DNA foundation model for transcription factor binding in plant genomics (submitted to MLCB).\n- Transition from research-focused work to more production-oriented data engineering roles.\n- Apply for top-tier roles in data engineering, AI infrastructure, or applied ML research.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla aims to achieve several short-term goals within the next 0-6 months, including deploying a personal AI chatbot, publishing a research paper on DNA foundation models, and transitioning to a production-oriented data engineering role. He also plans to apply for top-tier roles in data engineering, AI infrastructure, or applied ML research.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's short-term career goals in data engineering and AI?\n- What research projects is Krishna currently working on in the field of plant genomics?\n- What are Krishna's plans for transitioning from research-focused work to industry roles in AI and data engineering?",
148
+ "metadata": {
149
+ "source": "goals.md",
150
+ "header": "# 🎯 Personal and Professional Goals",
151
+ "chunk_id": "goals.md_#0_8d7193bb",
152
+ "has_header": true,
153
+ "word_count": 68,
154
+ "summary": "Krishna Vamsi Dhulipalla aims to achieve several short-term goals within the next 0-6 months, including deploying a personal AI chatbot, publishing a research paper on DNA foundation models, and transitioning to a production-oriented data engineering role. He also plans to apply for top-tier roles in data engineering, AI infrastructure, or applied ML research.",
155
+ "synthetic_queries": [
156
+ "What are Krishna Vamsi Dhulipalla's short-term career goals in data engineering and AI?",
157
+ "What research projects is Krishna currently working on in the field of plant genomics?",
158
+ "What are Krishna's plans for transitioning from research-focused work to industry roles in AI and data engineering?"
159
+ ]
160
+ }
161
+ },
162
+ {
163
+ "text": "## Mid-Term Goals (6–12 months)\n\n- Contribute to or create an open-source ML/data engineering project (e.g., genomic toolkits, chatbot agents).\n- Refine MLOps skills by deploying containerized models with CI/CD + observability on cloud-native platforms.\n- Scale chatbot to support personal file ingestion, calendar querying, and document Q&A.\n- Prepare for technical interviews and secure a full-time role in a US-based company with visa support.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla aims to contribute to or create an open-source ML/data engineering project and refine his MLOps skills within the next 6-12 months, while also preparing for technical interviews to secure a full-time role in the US. He also plans to enhance his chatbot project to support various features.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's goals for improving his machine learning skills?\n- What projects is Krishna planning to work on in the next year?\n- How is Krishna preparing for his career in the US?",
164
+ "metadata": {
165
+ "source": "goals.md",
166
+ "header": "# 🎯 Personal and Professional Goals",
167
+ "chunk_id": "goals.md_#1_0109ec8b",
168
+ "has_header": true,
169
+ "word_count": 65,
170
+ "summary": "Krishna Vamsi Dhulipalla aims to contribute to or create an open-source ML/data engineering project and refine his MLOps skills within the next 6-12 months, while also preparing for technical interviews to secure a full-time role in the US. He also plans to enhance his chatbot project to support various features.",
171
+ "synthetic_queries": [
172
+ "What are Krishna Vamsi Dhulipalla's goals for improving his machine learning skills?",
173
+ "What projects is Krishna planning to work on in the next year?",
174
+ "How is Krishna preparing for his career in the US?"
175
+ ]
176
+ }
177
+ },
178
+ {
179
+ "text": "## Long-Term Goals (1–3 years)\n\n- Become a senior data engineer or applied ML engineer focused on infrastructure, agent orchestration, or LLM ops.\n- Continue publishing in ML for life sciences, focusing on bioinformatics + transformer applications.\n- Build a framework or product (open-source or startup) that connects genomics, LLMs, and real-time pipelines.\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla's long-term goals include advancing in his data engineering career and making significant contributions to the field of machine learning, particularly in life sciences and genomics. He also aims to build an open-source framework or startup product integrating genomics, LLMs, and real-time pipelines.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's career aspirations in the field of data engineering and machine learning?\n- What specific areas of research is Krishna interested in pursuing in the field of life sciences?\n- What kind of projects or products is Krishna hoping to develop in the intersection of genomics and LLMs?",
180
+ "metadata": {
181
+ "source": "goals.md",
182
+ "header": "# 🎯 Personal and Professional Goals",
183
+ "chunk_id": "goals.md_#2_bf737ebf",
184
+ "has_header": true,
185
+ "word_count": 53,
186
+ "summary": "Krishna Vamsi Dhulipalla's long-term goals include advancing in his data engineering career and making significant contributions to the field of machine learning, particularly in life sciences and genomics. He also aims to build an open-source framework or startup product integrating genomics, LLMs, and real-time pipelines.",
187
+ "synthetic_queries": [
188
+ "What are Krishna Vamsi Dhulipalla's career aspirations in the field of data engineering and machine learning?",
189
+ "What specific areas of research is Krishna interested in pursuing in the field of life sciences?",
190
+ "What kind of projects or products is Krishna hoping to develop in the intersection of genomics and LLMs?"
191
+ ]
192
+ }
193
+ },
194
+ {
195
+ "text": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla\n\nI’m a Computer Science graduate student at Virginia Tech (M.S., expected Dec 2024) with 3+ years of experience across data engineering, machine learning research, and real-time analytics. I’m passionate about building intelligent, scalable systems with LLMs, RAG, and big data technologies.\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla is a Computer Science graduate student at Virginia Tech with experience in data engineering, machine learning research, and real-time analytics. He is passionate about building intelligent systems with large language models and big data technologies.\n\n🔸 Related Questions:\n- What is Krishna Vamsi Dhulipalla's educational background and work experience?\n- What technologies is Krishna Vamsi Dhulipalla interested in building systems with?\n- What field is Krishna Vamsi Dhulipalla studying and what is his expected graduation date?",
196
+ "metadata": {
197
+ "source": "profile.md",
198
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
199
+ "chunk_id": "profile.md_#0_90d6e50f",
200
+ "has_header": true,
201
+ "word_count": 49,
202
+ "summary": "Krishna Vamsi Dhulipalla is a Computer Science graduate student at Virginia Tech with experience in data engineering, machine learning research, and real-time analytics. He is passionate about building intelligent systems with large language models and big data technologies.",
203
+ "synthetic_queries": [
204
+ "What is Krishna Vamsi Dhulipalla's educational background and work experience?",
205
+ "What technologies is Krishna Vamsi Dhulipalla interested in building systems with?",
206
+ "What field is Krishna Vamsi Dhulipalla studying and what is his expected graduation date?"
207
+ ]
208
+ }
209
+ },
210
+ {
211
+ "text": "## 🎯 Summary\n\n- 👨‍💻 3+ years in Data Engineering and ML Research\n- 🔁 Focused on LLMs, RAG pipelines, and Genomics\n- ☁️ Experienced with AWS, GCP, and containerized deployments\n- 🔬 Strong background in transformer models, data pipelines, and real-time analytics\n\n---\n\n## 🔭 Current Focus areas\n\n- Fine-tuning and deploying transformer-based genome classification pipelines\n- Building RAG agents and LLM orchestration workflows\n- Architecting real-time data pipelines with Spark, Kafka, and Airflow\n- Containerized, cloud-native deployment (AWS, GCP, Docker)\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla has 3+ years of experience in Data Engineering and ML Research, with a focus on transformer models, data pipelines, and real-time analytics, particularly in the areas of LLMs, RAG pipelines, and Genomics. He has experience with cloud platforms like AWS and GCP, and containerized deployments.\n\n🔸 Related Questions:\n- What is Krishna Vamsi Dhulipalla's background and expertise in Data Engineering and ML Research?\n- What are Krishna's current focus areas in terms of research and development?\n- What technologies and platforms is Krishna experienced with in his work on LLMs and Genomics?",
212
+ "metadata": {
213
+ "source": "profile.md",
214
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
215
+ "chunk_id": "profile.md_#1_0f906409",
216
+ "has_header": true,
217
+ "word_count": 83,
218
+ "summary": "Krishna Vamsi Dhulipalla has 3+ years of experience in Data Engineering and ML Research, with a focus on transformer models, data pipelines, and real-time analytics, particularly in the areas of LLMs, RAG pipelines, and Genomics. He has experience with cloud platforms like AWS and GCP, and containerized deployments.",
219
+ "synthetic_queries": [
220
+ "What is Krishna Vamsi Dhulipalla's background and expertise in Data Engineering and ML Research?",
221
+ "What are Krishna's current focus areas in terms of research and development?",
222
+ "What technologies and platforms is Krishna experienced with in his work on LLMs and Genomics?"
223
+ ]
224
+ }
225
+ },
226
+ {
227
+ "text": "## 🎓 Education\n\n### **Virginia Tech** — M.S. in Computer Science\n\n📍Blacksburg, VA | _Jan 2023 – Dec 2024_ \n**CGPA:** 3.95 / 4.0 \nFocus: Distributed Systems, ML Optimization, Genomics, Transformer Models\n\n### **Vel Tech University** — B.Tech in CSE\n\n📍Chennai, India | _Jun 2018 – May 2022_ \n**CGPA:** 8.24 / 10 \nFocus: Real-Time Analytics Systems, Cloud Fundamentals\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla pursued higher education in Computer Science, first earning a B.Tech from Vel Tech University and later an M.S. from Virginia Tech, focusing on areas such as Distributed Systems, ML Optimization, and Genomics. Both degrees were completed with high CGPA scores.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's educational qualifications?\n- Where did Krishna pursue his higher education in Computer Science?\n- What areas of focus did Krishna have during his master's degree at Virginia Tech?",
228
+ "metadata": {
229
+ "source": "profile.md",
230
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
231
+ "chunk_id": "profile.md_#2_29b9983f",
232
+ "has_header": true,
233
+ "word_count": 58,
234
+ "summary": "Krishna Vamsi Dhulipalla pursued higher education in Computer Science, first earning a B.Tech from Vel Tech University and later an M.S. from Virginia Tech, focusing on areas such as Distributed Systems, ML Optimization, and Genomics. Both degrees were completed with high CGPA scores.",
235
+ "synthetic_queries": [
236
+ "What are Krishna Vamsi Dhulipalla's educational qualifications?",
237
+ "Where did Krishna pursue his higher education in Computer Science?",
238
+ "What areas of focus did Krishna have during his master's degree at Virginia Tech?"
239
+ ]
240
+ }
241
+ },
242
+ {
243
+ "text": "## 🛠️ Technical Skills\n\n### Programming Languages skills\n\n- Python, R, SQL, JavaScript, TypeScript, FastApi, nodeJs\n\n### Machine Learning & AI skills\n\n- PyTorch, TensorFlow, Transformers (Hugging Face), GANs, XGBoost, SHAP, Langchain, scikit-learn, LLM finetuning, RAG, Prompt Engineering, Text & Image Generation, Self-Supervised Learning, Hyperparameter Optimization, A/B Testing, Synthetic Data Generation, Cross-Domain Adaptation, kNN, Naive Bayes, SVM, Decision Trees/Random Forests, Clustering, PCA, EDA, Model Evaluation\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla has expertise in various programming languages, including Python, R, and JavaScript, as well as a wide range of machine learning and AI skills, including deep learning frameworks and techniques. His technical skills encompass areas such as natural language processing, computer vision, and predictive modeling.\n\n🔸 Related Questions:\n- What programming languages and machine learning frameworks is Krishna Vamsi Dhulipalla proficient in?\n- What are Krishna Vamsi Dhulipalla's areas of expertise in the field of artificial intelligence?\n- What technical skills does Krishna Vamsi Dhulipalla possess that are relevant to data science and predictive modeling?",
244
+ "metadata": {
245
+ "source": "profile.md",
246
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
247
+ "chunk_id": "profile.md_#3_7b599e79",
248
+ "has_header": true,
249
+ "word_count": 65,
250
+ "summary": "Krishna Vamsi Dhulipalla has expertise in various programming languages, including Python, R, and JavaScript, as well as a wide range of machine learning and AI skills, including deep learning frameworks and techniques. His technical skills encompass areas such as natural language processing, computer vision, and predictive modeling.",
251
+ "synthetic_queries": [
252
+ "What programming languages and machine learning frameworks is Krishna Vamsi Dhulipalla proficient in?",
253
+ "What are Krishna Vamsi Dhulipalla's areas of expertise in the field of artificial intelligence?",
254
+ "What technical skills does Krishna Vamsi Dhulipalla possess that are relevant to data science and predictive modeling?"
255
+ ]
256
+ }
257
+ },
258
+ {
259
+ "text": "### Data Engineering skills\n\n- Apache Kafka, Apache Spark, dbt, Delta Lake, Apache Airflow, ETL, Big Data Workflows, Data Warehousing, Distributed Systems\n\n### Cloud & Infrastructure skills\n\n- **AWS:** S3, Glue, Redshift, ECS, SageMaker, CloudWatch\n- **GCP:** BigQuery, Cloud Composer\n- **Other:** Snowflake, MongoDB\n\n### DevOps & MLOps skills\n\n- Docker, Kubernetes, CI/CD pipelines, MLflow, Weights & Biases (W&B)\n\n### Visualization Tools skills\n\n- Tableau, Plotly, Shiny (R), Looker\n\n### Others sills\n\n- REST APIs, Git, Pandas, NumPy\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla has expertise in various technical skills, including Data Engineering, Cloud & Infrastructure, DevOps & MLOps, Visualization Tools, and others. His skills span across technologies such as Apache Kafka, Apache Spark, AWS, GCP, Docker, Kubernetes, and more.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's technical skills and expertise?\n- What cloud and infrastructure platforms is Krishna experienced with?\n- What tools and technologies does Krishna use for data engineering and visualization?",
260
+ "metadata": {
261
+ "source": "profile.md",
262
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
263
+ "chunk_id": "profile.md_#4_6082e7b0",
264
+ "has_header": true,
265
+ "word_count": 79,
266
+ "summary": "Krishna Vamsi Dhulipalla has expertise in various technical skills, including Data Engineering, Cloud & Infrastructure, DevOps & MLOps, Visualization Tools, and others. His skills span across technologies such as Apache Kafka, Apache Spark, AWS, GCP, Docker, Kubernetes, and more.",
267
+ "synthetic_queries": [
268
+ "What are Krishna Vamsi Dhulipalla's technical skills and expertise?",
269
+ "What cloud and infrastructure platforms is Krishna experienced with?",
270
+ "What tools and technologies does Krishna use for data engineering and visualization?"
271
+ ]
272
+ }
273
+ },
274
+ {
275
+ "text": "## 💼 Experience\n\n### Data Scientist | Virginia Tech (current)\n\n📍 Blacksburg, VA | _Sep 2024 – Present_\n\n- Designed modular PyTorch pipelines for plant genome classification (94% accuracy)\n- Used Airflow DAGs + dbt to preprocess over 1M biological samples (↑ throughput by 40%)\n- Deployed LLMs via SageMaker + Docker with monitoring in MLflow + CloudWatch\n- Created Python libraries to streamline research cycles (↑ dev productivity by 20%)\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla currently works as a Data Scientist at Virginia Tech, where he has designed and deployed various data pipelines and machine learning models to improve research productivity and efficiency. His projects have achieved notable metrics, such as 94% accuracy in plant genome classification and 40% increase in throughput.\n\n🔸 Related Questions:\n- What is Krishna Vamsi Dhulipalla's current role and what projects has he worked on?\n- What are some notable achievements of Krishna Vamsi Dhulipalla as a Data Scientist?\n- What technologies and tools has Krishna Vamsi Dhulipalla used in his work at Virginia Tech?",
276
+ "metadata": {
277
+ "source": "profile.md",
278
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
279
+ "chunk_id": "profile.md_#5_570ef5c6",
280
+ "has_header": true,
281
+ "word_count": 71,
282
+ "summary": "Krishna Vamsi Dhulipalla currently works as a Data Scientist at Virginia Tech, where he has designed and deployed various data pipelines and machine learning models to improve research productivity and efficiency. His projects have achieved notable metrics, such as 94% accuracy in plant genome classification and 40% increase in throughput.",
283
+ "synthetic_queries": [
284
+ "What is Krishna Vamsi Dhulipalla's current role and what projects has he worked on?",
285
+ "What are some notable achievements of Krishna Vamsi Dhulipalla as a Data Scientist?",
286
+ "What technologies and tools has Krishna Vamsi Dhulipalla used in his work at Virginia Tech?"
287
+ ]
288
+ }
289
+ },
290
+ {
291
+ "text": "### Research Assistant | Virginia Tech\n\n📍 Blacksburg, VA | _Jun 2023 – May 2024_\n\n- ETL pipelines using AWS Glue + Airflow to Redshift (↑ availability by 50%)\n- Built CI/CD for ML model deployment, reducing manual effort by 40%\n- Led reproducibility effort with SageMaker tracking and automated logging\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla worked as a Research Assistant at Virginia Tech from June 2023 to May 2024, where he improved data pipeline availability and automated ML model deployment. He led efforts in reproducibility using SageMaker tracking and automated logging.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's research experience and accomplishments at Virginia Tech?\n- How did Krishna Vamsi Dhulipalla improve data pipeline efficiency during his tenure at Virginia Tech?\n- What are some examples of Krishna Vamsi Dhulipalla's work in machine learning model deployment and reproducibility?",
292
+ "metadata": {
293
+ "source": "profile.md",
294
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
295
+ "chunk_id": "profile.md_#6_d0244ad6",
296
+ "has_header": true,
297
+ "word_count": 51,
298
+ "summary": "Krishna Vamsi Dhulipalla worked as a Research Assistant at Virginia Tech from June 2023 to May 2024, where he improved data pipeline availability and automated ML model deployment. He led efforts in reproducibility using SageMaker tracking and automated logging.",
299
+ "synthetic_queries": [
300
+ "What are Krishna Vamsi Dhulipalla's research experience and accomplishments at Virginia Tech?",
301
+ "How did Krishna Vamsi Dhulipalla improve data pipeline efficiency during his tenure at Virginia Tech?",
302
+ "What are some examples of Krishna Vamsi Dhulipalla's work in machine learning model deployment and reproducibility?"
303
+ ]
304
+ }
305
+ },
306
+ {
307
+ "text": "### Data Engineer | UJR Technologies Pvt Ltd\n\n📍 Hyderabad, India | _Jul 2021 – Dec 2022_\n\n- Migrated batch ETL → real-time using Kafka + Spark (↓ latency by 30%)\n- Built containerized services with Docker on ECS (↑ deployment speed by 25%)\n- Tuned Snowflake warehouses, optimized materialized views (↓ query time by 40%)\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla worked as a Data Engineer at UJR Technologies Pvt Ltd from Jul 2021 to Dec 2022, where he achieved significant improvements in ETL latency, deployment speed, and query time. He utilized technologies such as Kafka, Spark, Docker, and Snowflake to accomplish these feats.\n\n🔸 Related Questions:\n- What were Krishna Vamsi Dhulipalla's accomplishments as a Data Engineer at UJR Technologies?\n- How did Krishna Vamsi Dhulipalla improve the efficiency of data processing and deployment in his previous role?\n- What technologies did Krishna Vamsi Dhulipalla use to optimize data processing and storage during his tenure at UJR Technologies?",
308
+ "metadata": {
309
+ "source": "profile.md",
310
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
311
+ "chunk_id": "profile.md_#7_c27f38dd",
312
+ "has_header": true,
313
+ "word_count": 57,
314
+ "summary": "Krishna Vamsi Dhulipalla worked as a Data Engineer at UJR Technologies Pvt Ltd from Jul 2021 to Dec 2022, where he achieved significant improvements in ETL latency, deployment speed, and query time. He utilized technologies such as Kafka, Spark, Docker, and Snowflake to accomplish these feats.",
315
+ "synthetic_queries": [
316
+ "What were Krishna Vamsi Dhulipalla's accomplishments as a Data Engineer at UJR Technologies?",
317
+ "How did Krishna Vamsi Dhulipalla improve the efficiency of data processing and deployment in his previous role?",
318
+ "What technologies did Krishna Vamsi Dhulipalla use to optimize data processing and storage during his tenure at UJR Technologies?"
319
+ ]
320
+ }
321
+ },
322
+ {
323
+ "text": "## 🧪 Key Projects\n\n### **Real-Time IoT-Based Temperature Forecasting**\n\n- Kafka-based pipeline for 10K+ sensor readings with LLaMA 2-based time series model (91% accuracy)\n- Airflow + Looker dashboards (↓ manual reporting by 30%)\n- S3 lifecycle policies saved 40% storage cost with versioned backups \n 🔗 [GitHub](https://github.com/krishna-creator/Real-Time-IoT-Based-Temperature-Analytics-and-Forecasting)\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla worked on a Real-Time IoT-Based Temperature Forecasting project, utilizing a Kafka-based pipeline and a LLaMA 2-based time series model, achieving 91% accuracy. He also implemented cost-saving measures and automations using Airflow and Looker dashboards.\n\n🔸 Related Questions:\n- What are some notable projects Krishna Vamsi Dhulipalla has worked on?\n- What IoT-based projects has Krishna Vamsi Dhulipalla contributed to?\n- Can you provide an example of Krishna Vamsi Dhulipalla's experience with data pipeline and analytics projects?",
324
+ "metadata": {
325
+ "source": "profile.md",
326
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
327
+ "chunk_id": "profile.md_#8_c6fde132",
328
+ "has_header": true,
329
+ "word_count": 47,
330
+ "summary": "Krishna Vamsi Dhulipalla worked on a Real-Time IoT-Based Temperature Forecasting project, utilizing a Kafka-based pipeline and a LLaMA 2-based time series model, achieving 91% accuracy. He also implemented cost-saving measures and automations using Airflow and Looker dashboards.",
331
+ "synthetic_queries": [
332
+ "What are some notable projects Krishna Vamsi Dhulipalla has worked on?",
333
+ "What IoT-based projects has Krishna Vamsi Dhulipalla contributed to?",
334
+ "Can you provide an example of Krishna Vamsi Dhulipalla's experience with data pipeline and analytics projects?"
335
+ ]
336
+ }
337
+ },
338
+ {
339
+ "text": "### **Proxy TuNER: Cross-Domain NER**\n\n- Developed a proxy tuning method for domain-agnostic BERT\n- 15% generalization gain using gradient reversal + feature alignment\n- 70% cost reduction via logit-level ensembling \n 🔗 [GitHub](https://github.com/krishna-creator/ProxytuNER)\n\n### **IntelliMeet: AI-Powered Conferencing**\n\n- Federated learning, end-to-end encrypted platform\n- Live attention detection using RetinaFace (<200ms latency)\n- Summarization with Transformer-based speech-to-text \n 🔗 [GitHub](https://github.com/krishna-creator/SE-Project---IntelliMeet)\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla developed two notable projects: Proxy TuNER, a cross-domain named entity recognition method that improves generalization and reduces costs, and IntelliMeet, an AI-powered conferencing platform that utilizes federated learning and end-to-end encryption. Both projects have accompanying GitHub repositories.\n\n🔸 Related Questions:\n- What are some notable AI projects developed by Krishna Vamsi Dhulipalla?\n- How has Krishna Vamsi Dhulipalla contributed to advancements in named entity recognition and conferencing technology?\n- What are some examples of Krishna Vamsi Dhulipalla's work in AI and machine learning, and where can I find more information about them?",
340
+ "metadata": {
341
+ "source": "profile.md",
342
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
343
+ "chunk_id": "profile.md_#9_e225dbb3",
344
+ "has_header": true,
345
+ "word_count": 58,
346
+ "summary": "Krishna Vamsi Dhulipalla developed two notable projects: Proxy TuNER, a cross-domain named entity recognition method that improves generalization and reduces costs, and IntelliMeet, an AI-powered conferencing platform that utilizes federated learning and end-to-end encryption. Both projects have accompanying GitHub repositories.",
347
+ "synthetic_queries": [
348
+ "What are some notable AI projects developed by Krishna Vamsi Dhulipalla?",
349
+ "How has Krishna Vamsi Dhulipalla contributed to advancements in named entity recognition and conferencing technology?",
350
+ "What are some examples of Krishna Vamsi Dhulipalla's work in AI and machine learning, and where can I find more information about them?"
351
+ ]
352
+ }
353
+ },
354
+ {
355
+ "text": "## 🧪 Key Projects\n\n### Real-Time IoT-Based Temperature Forecasting\n\n- Kafka-based pipeline for 10K+ sensor readings with LLaMA 2-based time series model (91% accuracy)\n- Airflow + Looker dashboards (↓ manual reporting by 30%)\n- S3 lifecycle policies saved 40% storage cost with versioned backups \n 🔗 [GitHub](https://github.com/krishna-creator/Real-Time-IoT-Based-Temperature-Analytics-and-Forecasting)\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla led a project on real-time IoT-based temperature forecasting, leveraging Kafka, LLaMA 2, and Airflow to achieve 91% accuracy and reduce manual reporting by 30%. The project also utilized S3 lifecycle policies to save 40% storage cost.\n\n🔸 Related Questions:\n- What notable projects has Krishna Vamsi Dhulipalla worked on?\n- How has Krishna Vamsi Dhulipalla applied machine learning models in his projects?\n- What are some examples of Krishna Vamsi Dhulipalla's work in IoT-based analytics and forecasting?",
356
+ "metadata": {
357
+ "source": "profile.md",
358
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
359
+ "chunk_id": "profile.md_#10_662d50be",
360
+ "has_header": true,
361
+ "word_count": 47,
362
+ "summary": "Krishna Vamsi Dhulipalla led a project on real-time IoT-based temperature forecasting, leveraging Kafka, LLaMA 2, and Airflow to achieve 91% accuracy and reduce manual reporting by 30%. The project also utilized S3 lifecycle policies to save 40% storage cost.",
363
+ "synthetic_queries": [
364
+ "What notable projects has Krishna Vamsi Dhulipalla worked on?",
365
+ "How has Krishna Vamsi Dhulipalla applied machine learning models in his projects?",
366
+ "What are some examples of Krishna Vamsi Dhulipalla's work in IoT-based analytics and forecasting?"
367
+ ]
368
+ }
369
+ },
370
+ {
371
+ "text": "### Proxy TuNER: Cross-Domain NER\n\n- Developed a proxy tuning method for domain-agnostic BERT\n- 15% generalization gain using gradient reversal + feature alignment\n- 70% cost reduction via logit-level ensembling \n 🔗 [GitHub](https://github.com/krishna-creator/ProxytuNER)\n\n### IntelliMeet: AI-Powered Conferencing\n\n- Federated learning, end-to-end encrypted platform\n- Live attention detection using RetinaFace (<200ms latency)\n- Summarization with Transformer-based speech-to-text \n 🔗 [GitHub](https://github.com/krishna-creator/SE-Project---IntelliMeet)\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla developed two notable projects: Proxy TuNER, a cross-domain named entity recognition method, and IntelliMeet, an AI-powered conferencing platform. These projects showcase Krishna's expertise in natural language processing, federated learning, and computer vision.\n\n🔸 Related Questions:\n- What AI projects has Krishna Vamsi Dhulipalla worked on?\n- How has Krishna contributed to the development of natural language processing and computer vision?\n- What are some notable accomplishments of Krishna Vamsi Dhulipalla in the field of AI research?",
372
+ "metadata": {
373
+ "source": "profile.md",
374
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
375
+ "chunk_id": "profile.md_#11_e7dc9201",
376
+ "has_header": true,
377
+ "word_count": 58,
378
+ "summary": "Krishna Vamsi Dhulipalla developed two notable projects: Proxy TuNER, a cross-domain named entity recognition method, and IntelliMeet, an AI-powered conferencing platform. These projects showcase Krishna's expertise in natural language processing, federated learning, and computer vision.",
379
+ "synthetic_queries": [
380
+ "What AI projects has Krishna Vamsi Dhulipalla worked on?",
381
+ "How has Krishna contributed to the development of natural language processing and computer vision?",
382
+ "What are some notable accomplishments of Krishna Vamsi Dhulipalla in the field of AI research?"
383
+ ]
384
+ }
385
+ },
386
+ {
387
+ "text": "### Automated Drone Image Analysis\n\n- Real-time crop disease detection using drone imagery\n- Used OpenCV, RAG, and GANs for synthetic data generation\n- Improved detection accuracy by 15% and reduced processing latency by 70%\n\n### COVID-19 Misinformation Tracking\n\n- NLP pipeline with BERT, NLTK, NetworkX on >1M tweets\n- Misinformation detection (89% accuracy)\n- Integrated sentiment analysis, influence tracking, and community detection\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla developed innovative solutions for crop disease detection using drone imagery and for tracking COVID-19 misinformation on Twitter. His techniques achieved significant improvements in accuracy and efficiency.\n\n🔸 Related Questions:\n- What projects has Krishna Vamsi Dhulipalla worked on that involve image analysis and machine learning?\n- How did Krishna Vamsi Dhulipalla use NLP techniques to track COVID-19 misinformation on social media?\n- What are some notable achievements of Krishna Vamsi Dhulipalla in the field of computer vision and data analysis?",
388
+ "metadata": {
389
+ "source": "profile.md",
390
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
391
+ "chunk_id": "profile.md_#12_5fcb1239",
392
+ "has_header": true,
393
+ "word_count": 63,
394
+ "summary": "Krishna Vamsi Dhulipalla developed innovative solutions for crop disease detection using drone imagery and for tracking COVID-19 misinformation on Twitter. His techniques achieved significant improvements in accuracy and efficiency.",
395
+ "synthetic_queries": [
396
+ "What projects has Krishna Vamsi Dhulipalla worked on that involve image analysis and machine learning?",
397
+ "How did Krishna Vamsi Dhulipalla use NLP techniques to track COVID-19 misinformation on social media?",
398
+ "What are some notable achievements of Krishna Vamsi Dhulipalla in the field of computer vision and data analysis?"
399
+ ]
400
+ }
401
+ },
402
+ {
403
+ "text": "### Talking Buddy: Emotional AI Companion\n\n- Built a context-aware chatbot with 68.7K parameter GRU\n- 85% sentiment classification accuracy\n- Deployed across multiple platforms with real-time response\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla developed an emotional AI companion, Talking Buddy, a context-aware chatbot with high sentiment classification accuracy and real-time response capabilities. The chatbot was successfully deployed across multiple platforms.\n\n🔸 Related Questions:\n- What AI projects has Krishna Vamsi Dhulipalla worked on?\n- What is Talking Buddy, and what features does it have?\n- What are some examples of Krishna's accomplishments in natural language processing?",
404
+ "metadata": {
405
+ "source": "profile.md",
406
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
407
+ "chunk_id": "profile.md_#13_f73a37ed",
408
+ "has_header": true,
409
+ "word_count": 29,
410
+ "summary": "Krishna Vamsi Dhulipalla developed an emotional AI companion, Talking Buddy, a context-aware chatbot with high sentiment classification accuracy and real-time response capabilities. The chatbot was successfully deployed across multiple platforms.",
411
+ "synthetic_queries": [
412
+ "What AI projects has Krishna Vamsi Dhulipalla worked on?",
413
+ "What is Talking Buddy, and what features does it have?",
414
+ "What are some examples of Krishna's accomplishments in natural language processing?"
415
+ ]
416
+ }
417
+ },
418
+ {
419
+ "text": "## 📜 Certifications\n\n- ✅ Building RAG Agents with LLMs – NVIDIA\n- ✅ Google Cloud Data Engineering Foundations\n- ✅ AWS Machine Learning Specialty\n- ✅ Microsoft MERN Development\n- ✅ End-to-End Real-World Data Engineering with Snowflake\n- ✅ Delivering Data-Driven Decisions with AWS\n- ✅ AICTE-EduSkills Certificate in AWS\n- ✅ Coursera ML Specialization\n > View all credentials: [LinkedIn Certifications](https://www.linkedin.com/in/krishnavamsidhulipalla/)\n\n---\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla has obtained various certifications in technologies such as AI, cloud computing, and data engineering from reputable platforms like NVIDIA, Google Cloud, AWS, and Coursera. These certifications demonstrate his expertise in these areas.\n\n🔸 Related Questions:\n- What certifications does Krishna Vamsi Dhulipalla hold in data engineering and machine learning?\n- What are Krishna's credentials in cloud computing and AI on platforms like AWS and Google Cloud?\n- What type of professional certifications has Krishna Vamsi Dhulipalla obtained to showcase his expertise in tech?",
420
+ "metadata": {
421
+ "source": "profile.md",
422
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
423
+ "chunk_id": "profile.md_#14_2db72caa",
424
+ "has_header": true,
425
+ "word_count": 63,
426
+ "summary": "Krishna Vamsi Dhulipalla has obtained various certifications in technologies such as AI, cloud computing, and data engineering from reputable platforms like NVIDIA, Google Cloud, AWS, and Coursera. These certifications demonstrate his expertise in these areas.",
427
+ "synthetic_queries": [
428
+ "What certifications does Krishna Vamsi Dhulipalla hold in data engineering and machine learning?",
429
+ "What are Krishna's credentials in cloud computing and AI on platforms like AWS and Google Cloud?",
430
+ "What type of professional certifications has Krishna Vamsi Dhulipalla obtained to showcase his expertise in tech?"
431
+ ]
432
+ }
433
+ },
434
+ {
435
+ "text": "## 📚 Publications\n\n- 🧬 _IEEE BIBM 2024_: \n “Leveraging ML for Predicting Circadian Transcription in mRNAs and lncRNAs” \n [DOI: 10.1109/BIBM62325.2024.10822684](https://doi.org/10.1109/BIBM62325.2024.10822684)\n\n- 🌿 _MLCB (Submitted)_: \n “Harshening DNA Foundation Models for TF Binding Prediction in Plants”\n\n---\n\n## 🔗 Links\n\n- 🌐 [Portfolio](http://krishna-dhulipalla.github.io)\n- 🧪 [GitHub](https://github.com/Krishna-dhulipalla)\n- 💼 [LinkedIn](https://www.linkedin.com/in/krishnavamsidhulipalla)\n- 📬 Email: [email protected]\n- 📱 Phone: +1 (540) 558-3528\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla is a researcher with publications in the field of bioinformatics, including a paper on predicting circadian transcription in mRNAs and lncRNAs. He has a portfolio, GitHub, LinkedIn, and contact information available online.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's research publications?\n- Where can I find Krishna Vamsi Dhulipalla's portfolio and GitHub profile?\n- How can I contact Krishna Vamsi Dhulipalla for collaboration or more information on his research?",
436
+ "metadata": {
437
+ "source": "profile.md",
438
+ "header": "# 👋 Hello, I'm Krishna Vamsi Dhulipalla",
439
+ "chunk_id": "profile.md_#15_ba72264c",
440
+ "has_header": true,
441
+ "word_count": 57,
442
+ "summary": "Krishna Vamsi Dhulipalla is a researcher with publications in the field of bioinformatics, including a paper on predicting circadian transcription in mRNAs and lncRNAs. He has a portfolio, GitHub, LinkedIn, and contact information available online.",
443
+ "synthetic_queries": [
444
+ "What are Krishna Vamsi Dhulipalla's research publications?",
445
+ "Where can I find Krishna Vamsi Dhulipalla's portfolio and GitHub profile?",
446
+ "How can I contact Krishna Vamsi Dhulipalla for collaboration or more information on his research?"
447
+ ]
448
+ }
449
+ },
450
+ {
451
+ "text": "# Current Tasks\n\nThese are the current ongoing tasks Krishna is actively working on:\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla is currently working on several ongoing tasks, which are listed here. These tasks are his active priorities at the moment.\n\n🔸 Related Questions:\n- What are Krishna's current projects?\n- What is Krishna Vamsi Dhulipalla working on right now?\n- What are Krishna's ongoing tasks at the moment?",
452
+ "metadata": {
453
+ "source": "task.md",
454
+ "header": "# Current Tasks",
455
+ "chunk_id": "task.md_#0_28e7171d",
456
+ "has_header": true,
457
+ "word_count": 14,
458
+ "summary": "Krishna Vamsi Dhulipalla is currently working on several ongoing tasks, which are listed here. These tasks are his active priorities at the moment.",
459
+ "synthetic_queries": [
460
+ "What are Krishna's current projects?",
461
+ "What is Krishna Vamsi Dhulipalla working on right now?",
462
+ "What are Krishna's ongoing tasks at the moment?"
463
+ ]
464
+ }
465
+ },
466
+ {
467
+ "text": "- 🔧 Build monolithic personal chatbot with FastAPI, Open Source LLM, and FAISS\n- 🔄 Refactor profile.md and chunk into semantic units for retrieval\n- 📁 Ingest resume, goals, and daily notes into vector DB with metadata\n- 🧠 Add multi-agent support (planner + tool caller) for downstream expansion\n- 📊 Debug and enhance gene co-expression visualization in R Shiny App\n- ✍️ Finalize publication for cross-species TFBS prediction (HyenaDNA-based)\n- 📬 Apply to 3 targeted data roles per week (focus: platform/data infra roles)\n- 📚 Review Kubernetes for ML deployment & NVIDIA's RAG Agent course weekly\n\n---\n🔹 Summary:\nKrishna Vamsi Dhulipalla's tasks include building a personal chatbot, refactoring profile documents, and enhancing data visualization in R Shiny App, among other projects. He is also working on applying to targeted data roles and reviewing Kubernetes for ML deployment.\n\n🔸 Related Questions:\n- What are Krishna Vamsi Dhulipalla's current projects and tasks?\n- What tools and technologies is Krishna using for his personal projects?\n- What are Krishna's goals and job aspirations in the field of data science?",
468
+ "metadata": {
469
+ "source": "task.md",
470
+ "header": "# Current Tasks",
471
+ "chunk_id": "task.md_#1_153da7e0",
472
+ "has_header": false,
473
+ "word_count": 97,
474
+ "summary": "Krishna Vamsi Dhulipalla's tasks include building a personal chatbot, refactoring profile documents, and enhancing data visualization in R Shiny App, among other projects. He is also working on applying to targeted data roles and reviewing Kubernetes for ML deployment.",
475
+ "synthetic_queries": [
476
+ "What are Krishna Vamsi Dhulipalla's current projects and tasks?",
477
+ "What tools and technologies is Krishna using for his personal projects?",
478
+ "What are Krishna's goals and job aspirations in the field of data science?"
479
+ ]
480
+ }
481
+ }
482
+ ]
app.py CHANGED
@@ -1,64 +1,385 @@
1
- import gradio as gr
2
- from huggingface_hub import InferenceClient
3
-
4
- """
5
- For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
6
- """
7
- client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
8
-
9
-
10
- def respond(
11
- message,
12
- history: list[tuple[str, str]],
13
- system_message,
14
- max_tokens,
15
- temperature,
16
- top_p,
17
- ):
18
- messages = [{"role": "system", "content": system_message}]
19
-
20
- for val in history:
21
- if val[0]:
22
- messages.append({"role": "user", "content": val[0]})
23
- if val[1]:
24
- messages.append({"role": "assistant", "content": val[1]})
25
-
26
- messages.append({"role": "user", "content": message})
27
-
28
- response = ""
29
-
30
- for message in client.chat_completion(
31
- messages,
32
- max_tokens=max_tokens,
33
- stream=True,
34
- temperature=temperature,
35
- top_p=top_p,
36
- ):
37
- token = message.choices[0].delta.content
38
-
39
- response += token
40
- yield response
41
-
42
-
43
- """
44
- For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
45
- """
46
- demo = gr.ChatInterface(
47
- respond,
48
- additional_inputs=[
49
- gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
50
- gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
51
- gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
52
- gr.Slider(
53
- minimum=0.1,
54
- maximum=1.0,
55
- value=0.95,
56
- step=0.05,
57
- label="Top-p (nucleus sampling)",
58
- ),
59
- ],
60
- )
61
-
62
-
63
- if __name__ == "__main__":
64
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import re
4
+ import hashlib
5
+ from functools import partial
6
+ from collections import defaultdict
7
+ from pathlib import Path
8
+ from typing import List, Dict, Any
9
+ import numpy as np
10
+ from dotenv import load_dotenv
11
+ from rich.console import Console
12
+ from rich.style import Style
13
+ from langchain_core.runnables import RunnableLambda
14
+ from langchain_nvidia_ai_endpoints import ChatNVIDIA
15
+ from langchain_core.output_parsers import StrOutputParser
16
+ from langchain_core.prompts import ChatPromptTemplate
17
+ from langchain.schema.runnable.passthrough import RunnableAssign
18
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
19
+ from langchain_huggingface import HuggingFaceEmbeddings
20
+ from langchain.vectorstores import FAISS
21
+ from langchain.docstore.document import Document
22
+ from langchain.retrievers import BM25Retriever
23
+ from langchain_openai import ChatOpenAI
24
+ from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
25
+
26
+ dotenv_path = os.path.join(os.getcwd(), ".env")
27
+ load_dotenv(dotenv_path)
28
+ api_key = os.getenv("NVIDIA_API_KEY")
29
+ os.environ["NVIDIA_API_KEY"] = api_key
30
+
31
+ # Constants
32
+ FAISS_PATH = "faiss_store/v30_600_150"
33
+ CHUNKS_PATH = "all_chunks.json"
34
+ KRISHNA_BIO = """Krishna Vamsi Dhulipalla is a graduate student in Computer Science at Virginia Tech (M.Eng, expected 2024), with over 3 years of experience across data engineering, machine learning research, and real-time analytics. He specializes in building scalable data systems and intelligent LLM-powered applications, with strong expertise in Python, PyTorch, Hugging Face Transformers, and end-to-end ML pipelines.
35
+
36
+ He has led projects involving retrieval-augmented generation (RAG), feature selection for genomic classification, fine-tuning domain-specific LLMs (e.g., DNABERT, HyenaDNA), and real-time forecasting systems using Kafka, Spark, and Airflow. His cloud proficiency spans AWS (S3, SageMaker, ECS, CloudWatch), GCP (BigQuery, Cloud Composer), and DevOps tools like Docker, Kubernetes, and MLflow.
37
+
38
+ Krishna’s academic focus areas include genomic sequence modeling, transformer optimization, MLOps automation, and cross-domain generalization. He has published research in bioinformatics and ML applications for circadian transcription prediction and transcription factor binding.
39
+
40
+ He is certified in NVIDIA’s RAG Agents with LLMs, Google Cloud Data Engineering, AWS ML Specialization, and has a proven ability to blend research and engineering in real-world systems. Krishna is passionate about scalable LLM infra, data-centric AI, and domain-adaptive ML solutions."""
41
+
42
+ def initialize_console():
43
+ console = Console()
44
+ base_style = Style(color="#76B900", bold=True)
45
+ return partial(console.print, style=base_style)
46
+
47
+ pprint = initialize_console()
48
+
49
+ def load_chunks_from_json(path: str = CHUNKS_PATH) -> List[Dict]:
50
+ with open(path, "r", encoding="utf-8") as f:
51
+ return json.load(f)
52
+
53
+ def load_faiss(path: str = FAISS_PATH,
54
+ model_name: str = "sentence-transformers/all-MiniLM-L6-v2") -> FAISS:
55
+ embeddings = HuggingFaceEmbeddings(model_name=model_name)
56
+ return FAISS.load_local(path, embeddings, allow_dangerous_deserialization=True)
57
+
58
+ def initialize_resources():
59
+ vectorstore = load_faiss()
60
+ all_chunks = load_chunks_from_json()
61
+ all_texts = [chunk["text"] for chunk in all_chunks]
62
+ metadatas = [chunk["metadata"] for chunk in all_chunks]
63
+ return vectorstore, all_chunks, all_texts, metadatas
64
+
65
+ vectorstore, all_chunks, all_texts, metadatas = initialize_resources()
66
+
67
+ # LLMs
68
+ repharser_llm = ChatNVIDIA(model="mistralai/mistral-7b-instruct-v0.3") | StrOutputParser()
69
+ relevance_llm = ChatNVIDIA(model="meta/llama3-70b-instruct") | StrOutputParser()
70
+ answer_llm = ChatOpenAI(
71
+ model="gpt-4-1106-preview",
72
+ temperature=0.3,
73
+ openai_api_key=os.getenv("OPENAI_API_KEY"),
74
+ streaming=True,
75
+ callbacks=[StreamingStdOutCallbackHandler()]
76
+ ) | StrOutputParser()
77
+
78
+
79
+ # Prompts
80
+ repharser_prompt = ChatPromptTemplate.from_template(
81
+ "Rewrite the question below in 4 diverse ways to retrieve semantically similar information.Ensure diversity in phrasings across style, voice, and abstraction:\n\nQuestion: {query}\n\nRewrites:"
82
+ )
83
+
84
+ relevance_prompt = ChatPromptTemplate.from_template("""
85
+ You are Krishna's personal AI assistant validator.
86
+ Your job is to review a user's question and a list of retrieved document chunks.
87
+ Identify which chunks (if any) directly help answer the question. Return **all relevant chunks**.
88
+
89
+ ---
90
+ ⚠️ Do NOT select chunks just because they include keywords or technical terms.
91
+
92
+ Exclude chunks that:
93
+ - Mention universities, CGPA, or education history (they show qualifications, not skills)
94
+ - List certifications or course names (they show credentials, not skills used)
95
+ - Describe goals, future plans, or job aspirations
96
+ - Contain tools mentioned in passing without describing actual usage
97
+
98
+ Only include chunks if they contain **evidence of specific knowledge, tools used, skills applied, or experience demonstrated.**
99
+
100
+ ---
101
+
102
+ 🔎 Examples:
103
+
104
+ Q1: "What are Krishna's skills?"
105
+ - Chunk A: Lists programming languages, ML tools, and projects → ✅
106
+ - Chunk B: Talks about a Coursera certificate in ML → ❌
107
+ - Chunk C: States a CGPA and master’s degree → ❌
108
+ - Chunk D: Describes tools Krishna used in his work → ✅
109
+
110
+ Output:
111
+ {{
112
+ "valid_chunks": [A, D],
113
+ "is_out_of_scope": false,
114
+ "justification": "Chunks A and D describe tools and skills Krishna has actually used."
115
+ }}
116
+
117
+ Q2: "What is Krishna's favorite color?"
118
+ - All chunks are about technical work or academic history → ❌
119
+
120
+ Output:
121
+ {{
122
+ "valid_chunks": [],
123
+ "is_out_of_scope": true,
124
+ "justification": "None of the chunks are related to the user's question about preferences or colors."
125
+ }}
126
+
127
+ ---
128
+
129
+ Now your turn.
130
+
131
+ User Question:
132
+ "{query}"
133
+
134
+ Chunks:
135
+ {contents}
136
+
137
+ Return only the JSON object. Think carefully before selecting any chunk.
138
+ """)
139
+
140
+ answer_prompt_relevant = ChatPromptTemplate.from_template(
141
+ "You are Krishna's personal AI assistant. Your job is to answer the user’s question clearly and professionally using the provided context.\n"
142
+ "Rather than copying sentences, synthesize relevant insights and explain them like a knowledgeable peer.\n\n"
143
+ "Krishna's Background:\n{profile}\n\n"
144
+ "Make your response rich and informative by:\n"
145
+ "- Combining relevant facts from multiple parts of the context\n"
146
+ "- Using natural, human-style language (not just bullet points)\n"
147
+ "- Expanding briefly on tools or skills when appropriate\n"
148
+ "- Avoiding repetition, filler, or hallucinations\n\n"
149
+ "Context:\n{context}\n\n"
150
+ "User Question:\n{query}\n\n"
151
+ "Answer:"
152
+ )
153
+
154
+ answer_prompt_fallback = ChatPromptTemplate.from_template(
155
+ "You are Krishna’s personal AI assistant. The user asked a question unrelated to Krishna’s background.\n"
156
+ "Gently let the user know, and then pivot to something Krishna is actually involved in to keep the conversation helpful.\n\n"
157
+ "Krishna's Background:\n{profile}\n\n"
158
+ "User Question:\n{query}\n\n"
159
+ "Your Answer:"
160
+ )
161
+ # Helper Functions
162
+ def parse_rewrites(raw_response: str) -> list[str]:
163
+ lines = raw_response.strip().split("\n")
164
+ return [line.strip("0123456789. ").strip() for line in lines if line.strip()][:4]
165
+
166
+ def hybrid_retrieve(inputs, exclude_terms=None):
167
+ # if exclude_terms is None:
168
+ # exclude_terms = ["cgpa", "university", "b.tech", "m.s.", "certification", "coursera", "edx", "goal", "aspiration", "linkedin", "publication", "ieee", "doi", "degree"]
169
+
170
+ all_queries = inputs["all_queries"]
171
+ bm25_retriever = BM25Retriever.from_texts(texts=all_texts, metadatas=metadatas)
172
+ bm25_retriever.k = inputs["k_per_query"]
173
+ vectorstore = inputs["vectorstore"]
174
+ alpha = inputs["alpha"]
175
+ top_k = inputs.get("top_k", 15)
176
+
177
+ scored_chunks = defaultdict(lambda: {
178
+ "vector_scores": [],
179
+ "bm25_score": 0.0,
180
+ "content": None,
181
+ "metadata": None,
182
+ })
183
+
184
+ for subquery in all_queries:
185
+ vec_hits = vectorstore.similarity_search_with_score(subquery, k=inputs["k_per_query"])
186
+ for doc, score in vec_hits:
187
+ key = hashlib.md5(doc.page_content.encode("utf-8")).hexdigest()
188
+ scored_chunks[key]["vector_scores"].append(score)
189
+ scored_chunks[key]["content"] = doc.page_content
190
+ scored_chunks[key]["metadata"] = doc.metadata
191
+
192
+ bm_hits = bm25_retriever.invoke(subquery)
193
+ for rank, doc in enumerate(bm_hits):
194
+ key = hashlib.md5(doc.page_content.encode("utf-8")).hexdigest()
195
+ bm_score = 1.0 - (rank / inputs["k_per_query"])
196
+ scored_chunks[key]["bm25_score"] += bm_score
197
+ scored_chunks[key]["content"] = doc.page_content
198
+ scored_chunks[key]["metadata"] = doc.metadata
199
+
200
+ all_vec_means = [np.mean(v["vector_scores"]) for v in scored_chunks.values() if v["vector_scores"]]
201
+ max_vec = max(all_vec_means) if all_vec_means else 1
202
+ min_vec = min(all_vec_means) if all_vec_means else 0
203
+
204
+ final_results = []
205
+ for chunk in scored_chunks.values():
206
+ vec_score = np.mean(chunk["vector_scores"]) if chunk["vector_scores"] else 0.0
207
+ norm_vec = (vec_score - min_vec) / (max_vec - min_vec) if max_vec != min_vec else 1.0
208
+ bm25_score = chunk["bm25_score"] / len(all_queries)
209
+ final_score = alpha * norm_vec + (1 - alpha) * bm25_score
210
+
211
+ content = chunk["content"].lower()
212
+ # if any(term in content for term in exclude_terms):
213
+ # continue
214
+ if final_score < 0.05 or len(content.strip()) < 100:
215
+ continue
216
+
217
+ final_results.append({
218
+ "content": chunk["content"],
219
+ "source": chunk["metadata"].get("source", ""),
220
+ "final_score": float(round(final_score, 4)),
221
+ "vector_score": float(round(vec_score, 4)),
222
+ "bm25_score": float(round(bm25_score, 4)),
223
+ "metadata": chunk["metadata"],
224
+ "summary": chunk["metadata"].get("summary", ""),
225
+ "synthetic_queries": chunk["metadata"].get("synthetic_queries", [])
226
+ })
227
+
228
+ final_results = sorted(final_results, key=lambda x: x["final_score"], reverse=True)
229
+
230
+ seen = set()
231
+ unique_chunks = []
232
+ for chunk in final_results:
233
+ clean_text = re.sub(r'\W+', '', chunk["content"].lower())[:300]
234
+ fingerprint = (chunk["source"], clean_text)
235
+ if fingerprint not in seen:
236
+ seen.add(fingerprint)
237
+ unique_chunks.append(chunk)
238
+
239
+ unique_chunks = unique_chunks[:top_k]
240
+
241
+ return {
242
+ "query": inputs["query"],
243
+ "chunks": unique_chunks
244
+ }
245
+
246
+ def safe_json_parse(s: str) -> Dict:
247
+ return json.loads(s) if isinstance(s, str) and "valid_chunks" in s else {
248
+ "valid_chunks": [],
249
+ "is_out_of_scope": True,
250
+ "justification": "Fallback due to invalid LLM output"
251
+ }
252
+
253
+ # Rewrite generation
254
+ rephraser_chain = (
255
+ repharser_prompt
256
+ | repharser_llm
257
+ | RunnableLambda(parse_rewrites)
258
+ )
259
+
260
+ generate_rewrites_chain = (
261
+ RunnableAssign({
262
+ "rewrites": lambda x: rephraser_chain.invoke({"query": x["query"]})
263
+ })
264
+ | RunnableAssign({
265
+ "all_queries": lambda x: [x["query"]] + x["rewrites"]
266
+ })
267
+ )
268
+
269
+ # Retrieval
270
+ retrieve_chain = RunnableLambda(hybrid_retrieve)
271
+ hybrid_chain = generate_rewrites_chain | retrieve_chain
272
+
273
+ # Validation
274
+ extract_validation_inputs = RunnableLambda(lambda x: {
275
+ "query": x["query"],
276
+ "contents": [c["content"] for c in x["chunks"]]
277
+ })
278
+
279
+ validation_chain = (
280
+ extract_validation_inputs
281
+ | relevance_prompt
282
+ | relevance_llm
283
+ | RunnableLambda(safe_json_parse)
284
+ )
285
+
286
+ # Answer Generation
287
+ def prepare_answer_inputs(x: Dict) -> Dict:
288
+ context = KRISHNA_BIO if x["validation"]["is_out_of_scope"] else "\n\n".join(
289
+ [x["chunks"][i-1]["content"] for i in x["validation"]["valid_chunks"]])
290
+
291
+ return {
292
+ "query": x["query"],
293
+ "profile": KRISHNA_BIO,
294
+ "context": context,
295
+ "use_fallback": x["validation"]["is_out_of_scope"]
296
+ }
297
+
298
+ select_and_prompt = RunnableLambda(lambda x:
299
+ answer_prompt_fallback.invoke(x) if x["use_fallback"]
300
+ else answer_prompt_relevant.invoke(x))
301
+
302
+ answer_chain = (
303
+ prepare_answer_inputs
304
+ | select_and_prompt
305
+ | relevance_llm
306
+ )
307
+
308
+ # Full Pipeline
309
+ full_pipeline = (
310
+ hybrid_chain
311
+ | RunnableAssign({"validation": validation_chain})
312
+ | RunnableAssign({"answer": answer_chain})
313
+ )
314
+
315
+ import gradio as gr
316
+
317
+ def chat_interface(message, history):
318
+ inputs = {
319
+ "query": message,
320
+ "all_queries": [message],
321
+ "all_texts": all_chunks,
322
+ "k_per_query": 3,
323
+ "alpha": 0.7,
324
+ "vectorstore": vectorstore,
325
+ "full_document": "",
326
+ }
327
+ response = ""
328
+ for chunk in full_pipeline.stream(inputs):
329
+ if isinstance(chunk, str):
330
+ response += chunk
331
+ yield response
332
+ elif isinstance(chunk, dict) and "answer" in chunk:
333
+ response += chunk["answer"]
334
+ yield response
335
+
336
+ with gr.Blocks(css="""
337
+ html, body, .gradio-container {
338
+ height: 100%;
339
+ margin: 0;
340
+ padding: 0;
341
+ }
342
+ .gradio-container {
343
+ width: 90%;
344
+ max-width: 1000px;
345
+ margin: 0 auto;
346
+ padding: 1rem;
347
+ }
348
+
349
+ .chatbox-container {
350
+ display: flex;
351
+ flex-direction: column;
352
+ height: 95%;
353
+ }
354
+
355
+ .chatbot {
356
+ flex: 1;
357
+ overflow-y: auto;
358
+ min-height: 500px;
359
+ }
360
+
361
+ .textbox {
362
+ margin-top: 1rem;
363
+ }
364
+ #component-523 {
365
+ height: 98%;
366
+ }
367
+ """) as demo:
368
+ with gr.Column(elem_classes="chatbox-container"):
369
+ gr.Markdown("## 💬 Ask Krishna's AI Assistant")
370
+ gr.Markdown("💡 Ask anything about Krishna Vamsi Dhulipalla")
371
+ chatbot = gr.Chatbot(elem_classes="chatbot")
372
+ textbox = gr.Textbox(placeholder="Ask a question about Krishna...", elem_classes="textbox")
373
+
374
+ gr.ChatInterface(
375
+ fn=chat_interface,
376
+ chatbot=chatbot,
377
+ textbox=textbox,
378
+ examples=[
379
+ "What are Krishna's research interests?",
380
+ "Where did Krishna work?",
381
+ "What did he study at Virginia Tech?"
382
+ ],
383
+ )
384
+
385
+ demo.launch()
faiss_store/v30_600_150/index.faiss ADDED
Binary file (46.1 kB). View file
 
requirements.txt CHANGED
Binary files a/requirements.txt and b/requirements.txt differ