Spaces:
Paused
Paused
Data & Data Structure Components
The data & data structure components include:
- The
Documentclass. - The document store.
- The vector store.
Data Loader
PdfLoader
Layout-aware with table parsing PdfLoader
MathPixLoader: To use this loader, you need MathPix API key, refer to mathpix docs for more information
OCRLoader: This loader uses lib-table and Flax pipeline to perform OCR and read table structure from PDF file (TODO: add more info about deployment of this module).
Output:
Document: text + metadata to identify whether it is table or not
- "source": source file name - "type": "table" or "text" - "table_origin": original table in markdown format (to be feed to LLM or visualize using external tools) - "page_label": page number in the original PDF document
Document Store
- InMemoryDocumentStore
Vector Store
- ChromaVectorStore
- InMemoryVectorStore