Spaces:
Paused
Paused
File size: 539 Bytes
8c3a73e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
sequenceDiagram
participant PDF as arXiv PDF Document
participant DL as Document Loader (PyMuPDF)
participant TS as Text Splitter (RecursiveCharacter)
participant EM as Embedding Model (OpenAI)
participant VDB as Vector Database (Qdrant)
participant DS as Dataset (Hugging Face)
PDF->>DL: Load document
Note over DL: extract_images=True
DL->>TS: Pass extracted text
TS->>EM: Send text chunks
EM->>VDB: Store embeddings
DL->>DS: Store metadata
DL->>DS: Store extracted text |