arxiv-rag-mvp / ingestion_flow_diagram.mermaid
donb-hf's picture
update services
84deff7
raw
history blame contribute delete
539 Bytes
sequenceDiagram
participant PDF as arXiv PDF Document
participant DL as Document Loader (PyMuPDF)
participant TS as Text Splitter (RecursiveCharacter)
participant EM as Embedding Model (OpenAI)
participant VDB as Vector Database (Qdrant)
participant DS as Dataset (Hugging Face)
PDF->>DL: Load document
Note over DL: extract_images=True
DL->>TS: Pass extracted text
TS->>EM: Send text chunks
EM->>VDB: Store embeddings
DL->>DS: Store metadata
DL->>DS: Store extracted text