writinwaters
commited on
Commit
·
ce611dd
1
Parent(s):
2ace2a9
Updated HTTP API reference and Python API reference based on test results (#3090)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- api/http_api_reference.md +9 -7
- api/python_api_reference.md +4 -5
api/http_api_reference.md
CHANGED
|
@@ -94,8 +94,10 @@ curl --request POST \
|
|
| 94 |
The configuration settings for the dataset parser, a JSON object containing the following attributes:
|
| 95 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 96 |
- `"layout_recognize"`: Defaults to `true`.
|
|
|
|
| 97 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
| 98 |
-
- `"task_page_size"`: Defaults to `12`.
|
|
|
|
| 99 |
|
| 100 |
### Response
|
| 101 |
|
|
@@ -177,7 +179,7 @@ curl --request DELETE \
|
|
| 177 |
|
| 178 |
#### Request parameters
|
| 179 |
|
| 180 |
-
- `"ids"`: (*Body parameter*), `list[string]`
|
| 181 |
The IDs of the datasets to delete. If it is not specified, all datasets will be deleted.
|
| 182 |
|
| 183 |
### Response
|
|
@@ -241,7 +243,7 @@ curl --request PUT \
|
|
| 241 |
- `"embedding_model"`: (*Body parameter*), `string`
|
| 242 |
The updated embedding model name.
|
| 243 |
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
|
| 244 |
-
- `"chunk_method"`: (*Body parameter*), `enum<string>`
|
| 245 |
The chunking method for the dataset. Available options:
|
| 246 |
- `"naive"`: General
|
| 247 |
- `"manual`: Manual
|
|
@@ -510,12 +512,12 @@ curl --request PUT \
|
|
| 510 |
- `"one"`: One
|
| 511 |
- `"knowledge_graph"`: Knowledge Graph
|
| 512 |
- `"email"`: Email
|
| 513 |
-
- `"parser_config"`: (*Body parameter*), `object`
|
| 514 |
The parsing configuration for the document:
|
| 515 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 516 |
- `"layout_recognize"`: Defaults to `true`.
|
| 517 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
| 518 |
-
- `"task_page_size"`: Defaults to `12`.
|
| 519 |
|
| 520 |
### Response
|
| 521 |
|
|
@@ -718,7 +720,7 @@ curl --request DELETE \
|
|
| 718 |
|
| 719 |
- `dataset_id`: (*Path parameter*)
|
| 720 |
The associated dataset ID.
|
| 721 |
-
- `"ids"`: (*Body parameter*), `list[string]`
|
| 722 |
The IDs of the documents to delete. If it is not specified, all documents in the specified dataset will be deleted.
|
| 723 |
|
| 724 |
### Response
|
|
@@ -1169,7 +1171,7 @@ Failure:
|
|
| 1169 |
|
| 1170 |
## Retrieve chunks
|
| 1171 |
|
| 1172 |
-
**
|
| 1173 |
|
| 1174 |
Retrieves chunks from specified datasets.
|
| 1175 |
|
|
|
|
| 94 |
The configuration settings for the dataset parser, a JSON object containing the following attributes:
|
| 95 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 96 |
- `"layout_recognize"`: Defaults to `true`.
|
| 97 |
+
- `"html4excel"`: Indicates whether to convert Excel documents into HTML format. Defaults to `false`.
|
| 98 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
| 99 |
+
- `"task_page_size"`: Defaults to `12`. For PDF only.
|
| 100 |
+
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 101 |
|
| 102 |
### Response
|
| 103 |
|
|
|
|
| 179 |
|
| 180 |
#### Request parameters
|
| 181 |
|
| 182 |
+
- `"ids"`: (*Body parameter*), `list[string]`
|
| 183 |
The IDs of the datasets to delete. If it is not specified, all datasets will be deleted.
|
| 184 |
|
| 185 |
### Response
|
|
|
|
| 243 |
- `"embedding_model"`: (*Body parameter*), `string`
|
| 244 |
The updated embedding model name.
|
| 245 |
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
|
| 246 |
+
- `"chunk_method"`: (*Body parameter*), `enum<string>`
|
| 247 |
The chunking method for the dataset. Available options:
|
| 248 |
- `"naive"`: General
|
| 249 |
- `"manual`: Manual
|
|
|
|
| 512 |
- `"one"`: One
|
| 513 |
- `"knowledge_graph"`: Knowledge Graph
|
| 514 |
- `"email"`: Email
|
| 515 |
+
- `"parser_config"`: (*Body parameter*), `object`
|
| 516 |
The parsing configuration for the document:
|
| 517 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 518 |
- `"layout_recognize"`: Defaults to `true`.
|
| 519 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
| 520 |
+
- `"task_page_size"`: Defaults to `12`. For PDF only.
|
| 521 |
|
| 522 |
### Response
|
| 523 |
|
|
|
|
| 720 |
|
| 721 |
- `dataset_id`: (*Path parameter*)
|
| 722 |
The associated dataset ID.
|
| 723 |
+
- `"ids"`: (*Body parameter*), `list[string]`
|
| 724 |
The IDs of the documents to delete. If it is not specified, all documents in the specified dataset will be deleted.
|
| 725 |
|
| 726 |
### Response
|
|
|
|
| 1171 |
|
| 1172 |
## Retrieve chunks
|
| 1173 |
|
| 1174 |
+
**POST** `/api/v1/retrieval`
|
| 1175 |
|
| 1176 |
Retrieves chunks from specified datasets.
|
| 1177 |
|
api/python_api_reference.md
CHANGED
|
@@ -1253,7 +1253,7 @@ Asks a question to start an AI-powered conversation.
|
|
| 1253 |
|
| 1254 |
#### question: `str` *Required*
|
| 1255 |
|
| 1256 |
-
The question to start an AI
|
| 1257 |
|
| 1258 |
#### stream: `bool`
|
| 1259 |
|
|
@@ -1286,7 +1286,7 @@ A list of `Chunk` objects representing references to the message, each containin
|
|
| 1286 |
- `content` `str`
|
| 1287 |
The content of the chunk.
|
| 1288 |
- `image_id` `str`
|
| 1289 |
-
The ID of the snapshot of the chunk.
|
| 1290 |
- `document_id` `str`
|
| 1291 |
The ID of the referenced document.
|
| 1292 |
- `document_name` `str`
|
|
@@ -1295,14 +1295,13 @@ A list of `Chunk` objects representing references to the message, each containin
|
|
| 1295 |
The location information of the chunk within the referenced document.
|
| 1296 |
- `dataset_id` `str`
|
| 1297 |
The ID of the dataset to which the referenced document belongs.
|
| 1298 |
-
- `similarity` `float`
|
| 1299 |
-
A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity.
|
| 1300 |
- `vector_similarity` `float`
|
| 1301 |
A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
|
| 1302 |
- `term_similarity` `float`
|
| 1303 |
A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.
|
| 1304 |
|
| 1305 |
-
|
| 1306 |
### Examples
|
| 1307 |
|
| 1308 |
```python
|
|
|
|
| 1253 |
|
| 1254 |
#### question: `str` *Required*
|
| 1255 |
|
| 1256 |
+
The question to start an AI-powered conversation.
|
| 1257 |
|
| 1258 |
#### stream: `bool`
|
| 1259 |
|
|
|
|
| 1286 |
- `content` `str`
|
| 1287 |
The content of the chunk.
|
| 1288 |
- `image_id` `str`
|
| 1289 |
+
The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file.
|
| 1290 |
- `document_id` `str`
|
| 1291 |
The ID of the referenced document.
|
| 1292 |
- `document_name` `str`
|
|
|
|
| 1295 |
The location information of the chunk within the referenced document.
|
| 1296 |
- `dataset_id` `str`
|
| 1297 |
The ID of the dataset to which the referenced document belongs.
|
| 1298 |
+
- `similarity` `float`
|
| 1299 |
+
A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`.
|
| 1300 |
- `vector_similarity` `float`
|
| 1301 |
A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
|
| 1302 |
- `term_similarity` `float`
|
| 1303 |
A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.
|
| 1304 |
|
|
|
|
| 1305 |
### Examples
|
| 1306 |
|
| 1307 |
```python
|