Spaces:
Paused
Paused
| # Contributing | |
| ## Setting up | |
| - Clone the repo | |
| ```shell | |
| git clone [email protected]:Cinnamon/kotaemon.git | |
| cd kotaemon | |
| ``` | |
| - Install the environment | |
| - Create a conda environment (python >= 3.10 is recommended) | |
| ```shell | |
| conda create -n kotaemon python=3.10 | |
| conda activate kotaemon | |
| # install dependencies | |
| cd libs/kotaemon | |
| pip install -e ".[all]" | |
| ``` | |
| - Or run the installer (one of the `scripts/run_*` scripts depends on your OS), then | |
| you will have all the dependencies installed as a conda environment at | |
| `install_dir/env`. | |
| ```shell | |
| conda activate install_dir/env | |
| ``` | |
| - Pre-commit | |
| ```shell | |
| pre-commit install | |
| ``` | |
| - Test | |
| ```shell | |
| pytest tests | |
| ``` | |
| ## Package overview | |
| `kotaemon` library focuses on the AI building blocks to implement a RAG-based QA application. It consists of base interfaces, core components and a list of utilities: | |
| - Base interfaces: `kotaemon` defines the base interface of a component in a pipeline. A pipeline is also a component. By clearly define this interface, a pipeline of steps can be easily constructed and orchestrated. | |
| - Core components: `kotaemon` implements (or wraps 3rd-party libraries | |
| like Langchain, llama-index,... when possible) commonly used components in | |
| kotaemon use cases. Some of these components are: LLM, vector store, | |
| document store, retriever... For a detailed list and description of these | |
| components, please refer to the [API Reference](../reference/Summary.md) section. | |
| - List of utilities: `kotaemon` provides utilities and tools that are | |
| usually needed in client project. For example, it provides a prompt | |
| engineering UI for AI developers in a project to quickly create a prompt | |
| engineering tool for DMs and QALs. It also provides a command to quickly spin | |
| up a project code base. For a full list and description of these utilities, | |
| please refer to the [Utilities](utilities.md) section. | |
| ```mermaid | |
| mindmap | |
| root((kotaemon)) | |
| Base Interfaces | |
| Document | |
| LLMInterface | |
| RetrievedDocument | |
| BaseEmbeddings | |
| BaseChat | |
| BaseCompletion | |
| ... | |
| Core Components | |
| LLMs | |
| AzureOpenAI | |
| OpenAI | |
| Embeddings | |
| AzureOpenAI | |
| OpenAI | |
| HuggingFaceEmbedding | |
| VectorStore | |
| InMemoryVectorstore | |
| ChromaVectorstore | |
| Agent | |
| Tool | |
| DocumentStore | |
| ... | |
| Utilities | |
| Scaffold project | |
| PromptUI | |
| Documentation Support | |
| ``` | |
| ## Common conventions | |
| - PR title: One-line description (example: Feat: Declare BaseComponent and decide LLM call interface). | |
| - [Encouraged] Provide a quick description in the PR, so that: | |
| - Reviewers can quickly understand the direction of the PR. | |
| - It will be included in the commit message when the PR is merged. | |
| ## Environment caching on PR | |
| - To speed up CI, environments are cached based on the version specified in `__init__.py`. | |
| - Since dependencies versions in `setup.py` are not pinned, you need to pump the version in order to use a new environment. That environment will then be cached and used by your subsequence commits within the PR, until you pump the version again | |
| - The new environment created during your PR is cached and will be available to others once the PR is merged. | |
| - If you are experimenting with new dependencies and want a fresh environment every time, add `[ignore cache]` in your commit message. The CI will create a fresh environment to run your commit and then discard it. | |
| - If your PR include updated dependencies, the recommended workflow would be: | |
| - Doing development as usual. | |
| - When you want to run the CI, push a commit with the message containing `[ignore cache]`. | |
| - Once the PR is final, pump the version in `__init__.py` and push a final commit not containing `[ignore cache]`. | |
| ## Merge PR guideline | |
| - Use squash and merge option | |
| - 1st line message is the PR title. | |
| - The text area is the PR description. | |