title: PogcastGPT | |
emoji: 💻 | |
colorFrom: blue | |
colorTo: indigo | |
sdk: streamlit | |
sdk_version: 1.10.0 | |
app_file: app.py | |
pinned: false | |
duplicated_from: somuch4subtlety/pogcastGPT | |
This app uses semantic search to find and summarize relevant sections of the Pogcast to answer a user's question. | |
The process began by downloading and transcribing Pogcast episodes using [OpenAI’s Whisper](https://github.com/openai/whisper). | |
The transcriptions were then chunked into sections of ~500 words and each chunk was vectorized using [OpenAI’s embedding endpoint](https://beta.openai.com/docs/guides/embeddings). | |
The embeddings and text are then stored in a [vector database](Pinecone.io). | |
When you ask a question, the text is run through the embedding endpoint and then is compared to all of the vectorized sections using cosine similarity. | |
The top results are used as context and passed to [OpenAI’s GPT-3 completion endpoint](https://beta.openai.com/docs/api-reference/completions) along with your question and an explanation of how GPT-3 should answer the question. | |
Lastly, the summary answer and top matching sections are displayed. | |
Note | |
The parameters and completion prompt are set loosely and the bot is likely to hallucinate during its anwsers. |