Spaces:
Sleeping
Sleeping
# DeepResearch | |
[Demo](https://jina.ai/deepsearch#demo) | [API](#official-api) | [Evaluation](#evaluation) | |
Keep searching, reading webpages, reasoning until an answer is found (or the token budget is exceeded). Useful for deeply investigating a query. | |
```mermaid | |
--- | |
config: | |
theme: mc | |
look: handDrawn | |
--- | |
flowchart LR | |
subgraph Loop["until budget exceed"] | |
direction LR | |
Search["Search"] | |
Read["Read"] | |
Reason["Reason"] | |
end | |
Query(["Query"]) --> Loop | |
Search --> Read | |
Read --> Reason | |
Reason --> Search | |
Loop --> Answer(["Answer"]) | |
``` | |
Unlike OpenAI and Gemini's Deep Research capabilities, we focus solely on **delivering accurate answers through our iterative process**. We don't optimize for long-form articles – if you need quick, precise answers from deep search, you're in the right place. If you're looking for AI-generated reports like OpenAI/Gemini do, this isn't for you. | |
## Install | |
```bash | |
git clone https://github.com/jina-ai/node-DeepResearch.git | |
cd node-DeepResearch | |
npm install | |
``` | |
[安装部署视频教程 on Youtube](https://youtu.be/vrpraFiPUyA) | |
It is also available on npm but not recommended for now, as the code is still under active development. | |
## Usage | |
We use Gemini (latest `gemini-2.0-flash`) / OpenAI / [LocalLLM](#use-local-llm) for reasoning, [Jina Reader](https://jina.ai/reader) for searching and reading webpages, you can get a free API key with 1M tokens from jina.ai. | |
```bash | |
export GEMINI_API_KEY=... # for gemini | |
# export OPENAI_API_KEY=... # for openai | |
# export LLM_PROVIDER=openai # for openai | |
export JINA_API_KEY=jina_... # free jina api key, get from https://jina.ai/reader | |
npm run dev $QUERY | |
``` | |
### Official API | |
You can also use our official DeepSearch API, hosted and optimized by Jina AI: | |
``` | |
https://deepsearch.jina.ai/v1/chat/completions | |
``` | |
You can use it with any OpenAI-compatible client. For the authentication Bearer, get your Jina API key from https://jina.ai | |
#### Client integration guidelines | |
If you are building a web/local/mobile client that uses `Jina DeepSearch API`, here are some design guidelines: | |
- Our API is fully compatible with [OpenAI API schema](https://platform.openai.com/docs/api-reference/chat/create), this should greatly simplify the integration process. The model name is `jina-deepsearch-v1`. | |
- Our DeepSearch API is a reasoning+search grounding LLM, so it's best for questions that require deep reasoning and search. | |
- Two special tokens are introduced `<think>...</think>`, `<references>...</references>`. Please render them with care. | |
- Guide the user to get a Jina API key from https://jina.ai, with 1M free tokens for new API key. | |
- There are rate limits, [between 10RPM to 30RPM depending on the API key tier](https://jina.ai/contact-sales#rate-limit). | |
- [Download Jina AI logo here](https://jina.ai/logo-Jina-1024.zip) | |
## Demo | |
> was recorded with `gemini-1.5-flash`, the latest `gemini-2.0-flash` leads to much better results! | |
Query: `"what is the latest blog post's title from jina ai?"` | |
3 steps; answer is correct! | |
 | |
Query: `"what is the context length of readerlm-v2?"` | |
2 steps; answer is correct! | |
 | |
Query: `"list all employees from jina ai that u can find, as many as possible"` | |
11 steps; partially correct! but im not in the list :( | |
 | |
Query: `"who will be the biggest competitor of Jina AI"` | |
42 steps; future prediction kind, so it's arguably correct! atm Im not seeing `weaviate` as a competitor, but im open for the future "i told you so" moment. | |
 | |
More examples: | |
``` | |
# example: no tool calling | |
npm run dev "1+1=" | |
npm run dev "what is the capital of France?" | |
# example: 2-step | |
npm run dev "what is the latest news from Jina AI?" | |
# example: 3-step | |
npm run dev "what is the twitter account of jina ai's founder" | |
# example: 13-step, ambiguious question (no def of "big") | |
npm run dev "who is bigger? cohere, jina ai, voyage?" | |
# example: open question, research-like, long chain of thoughts | |
npm run dev "who will be president of US in 2028?" | |
npm run dev "what should be jina ai strategy for 2025?" | |
``` | |
## Use Local LLM | |
> Note, not every LLM works with our reasoning flow, we need those who support structured output (sometimes called JSON Schema output, object output) well. Feel free to purpose a PR to add more open-source LLMs to the working list. | |
If you use Ollama or LMStudio, you can redirect the reasoning request to your local LLM by setting the following environment variables: | |
```bash | |
export LLM_PROVIDER=openai # yes, that's right - for local llm we still use openai client | |
export OPENAI_BASE_URL=http://127.0.0.1:1234/v1 # your local llm endpoint | |
export OPENAI_API_KEY=whatever # random string would do, as we don't use it (unless your local LLM has authentication) | |
export DEFAULT_MODEL_NAME=qwen2.5-7b # your local llm model name | |
``` | |
## OpenAI-Compatible Server API | |
If you have a GUI client that supports OpenAI API (e.g. [CherryStudio](https://docs.cherry-ai.com/), [Chatbox](https://github.com/Bin-Huang/chatbox)) , you can simply config it to use this server. | |
 | |
Start the server: | |
```bash | |
# Without authentication | |
npm run serve | |
# With authentication (clients must provide this secret as Bearer token) | |
npm run serve --secret=your_secret_token | |
``` | |
The server will start on http://localhost:3000 with the following endpoint: | |
### POST /v1/chat/completions | |
```bash | |
# Without authentication | |
curl http://localhost:3000/v1/chat/completions \ | |
-H "Content-Type: application/json" \ | |
-d '{ | |
"model": "jina-deepsearch-v1", | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "Hello!" | |
} | |
] | |
}' | |
# With authentication (when server is started with --secret) | |
curl http://localhost:3000/v1/chat/completions \ | |
-H "Content-Type: application/json" \ | |
-H "Authorization: Bearer your_secret_token" \ | |
-d '{ | |
"model": "jina-deepsearch-v1", | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "Hello!" | |
} | |
], | |
"stream": true | |
}' | |
``` | |
Response format: | |
```json | |
{ | |
"id": "chatcmpl-123", | |
"object": "chat.completion", | |
"created": 1677652288, | |
"model": "jina-deepsearch-v1", | |
"system_fingerprint": "fp_44709d6fcb", | |
"choices": [{ | |
"index": 0, | |
"message": { | |
"role": "assistant", | |
"content": "YOUR FINAL ANSWER" | |
}, | |
"logprobs": null, | |
"finish_reason": "stop" | |
}], | |
"usage": { | |
"prompt_tokens": 9, | |
"completion_tokens": 12, | |
"total_tokens": 21 | |
} | |
} | |
``` | |
For streaming responses (stream: true), the server sends chunks in this format: | |
```json | |
{ | |
"id": "chatcmpl-123", | |
"object": "chat.completion.chunk", | |
"created": 1694268190, | |
"model": "jina-deepsearch-v1", | |
"system_fingerprint": "fp_44709d6fcb", | |
"choices": [{ | |
"index": 0, | |
"delta": { | |
"content": "..." | |
}, | |
"logprobs": null, | |
"finish_reason": null | |
}] | |
} | |
``` | |
Note: The think content in streaming responses is wrapped in XML tags: | |
``` | |
<think> | |
[thinking steps...] | |
</think> | |
[final answer] | |
``` | |
## Docker Setup | |
### Build Docker Image | |
To build the Docker image for the application, run the following command: | |
```bash | |
docker build -t deepresearch:latest . | |
``` | |
### Run Docker Container | |
To run the Docker container, use the following command: | |
```bash | |
docker run -p 3000:3000 --env GEMINI_API_KEY=your_gemini_api_key --env JINA_API_KEY=your_jina_api_key deepresearch:latest | |
``` | |
### Docker Compose | |
You can also use Docker Compose to manage multi-container applications. To start the application with Docker Compose, run: | |
```bash | |
docker-compose up | |
``` | |
## How Does it Work? | |
Not sure a flowchart helps, but here it is: | |
```mermaid | |
flowchart TD | |
Start([Start]) --> Init[Initialize context & variables] | |
Init --> CheckBudget{Token budget<br/>exceeded?} | |
CheckBudget -->|No| GetQuestion[Get current question<br/>from gaps] | |
CheckBudget -->|Yes| BeastMode[Enter Beast Mode] | |
GetQuestion --> GenPrompt[Generate prompt] | |
GenPrompt --> ModelGen[Generate response<br/>using Gemini] | |
ModelGen --> ActionCheck{Check action<br/>type} | |
ActionCheck -->|answer| AnswerCheck{Is original<br/>question?} | |
AnswerCheck -->|Yes| EvalAnswer[Evaluate answer] | |
EvalAnswer --> IsGoodAnswer{Is answer<br/>definitive?} | |
IsGoodAnswer -->|Yes| HasRefs{Has<br/>references?} | |
HasRefs -->|Yes| End([End]) | |
HasRefs -->|No| GetQuestion | |
IsGoodAnswer -->|No| StoreBad[Store bad attempt<br/>Reset context] | |
StoreBad --> GetQuestion | |
AnswerCheck -->|No| StoreKnowledge[Store as intermediate<br/>knowledge] | |
StoreKnowledge --> GetQuestion | |
ActionCheck -->|reflect| ProcessQuestions[Process new<br/>sub-questions] | |
ProcessQuestions --> DedupQuestions{New unique<br/>questions?} | |
DedupQuestions -->|Yes| AddGaps[Add to gaps queue] | |
DedupQuestions -->|No| DisableReflect[Disable reflect<br/>for next step] | |
AddGaps --> GetQuestion | |
DisableReflect --> GetQuestion | |
ActionCheck -->|search| SearchQuery[Execute search] | |
SearchQuery --> NewURLs{New URLs<br/>found?} | |
NewURLs -->|Yes| StoreURLs[Store URLs for<br/>future visits] | |
NewURLs -->|No| DisableSearch[Disable search<br/>for next step] | |
StoreURLs --> GetQuestion | |
DisableSearch --> GetQuestion | |
ActionCheck -->|visit| VisitURLs[Visit URLs] | |
VisitURLs --> NewContent{New content<br/>found?} | |
NewContent -->|Yes| StoreContent[Store content as<br/>knowledge] | |
NewContent -->|No| DisableVisit[Disable visit<br/>for next step] | |
StoreContent --> GetQuestion | |
DisableVisit --> GetQuestion | |
BeastMode --> FinalAnswer[Generate final answer] --> End | |
``` | |
## Evaluation | |
I kept the evaluation simple, LLM-as-a-judge and collect some [ego questions](./src/evals/ego-questions.json) for evaluation. These are the questions about Jina AI that I know 100% the answer but LLMs do not. | |
I mainly look at 3 things: total steps, total tokens, and the correctness of the final answer. | |
```bash | |
npm run eval ./src/evals/questions.json | |
``` | |
Here's the table comparing plain `gemini-2.0-flash` and `gemini-2.0-flash + node-deepresearch` on the ego set. | |
Plain `gemini-2.0-flash` can be run by setting `tokenBudget` to zero, skipping the while-loop and directly answering the question. | |
It should not be surprised that plain `gemini-2.0-flash` has a 0% pass rate, as I intentionally filtered out the questions that LLMs can answer. | |
| Metric | gemini-2.0-flash | #188f1bb | | |
|--------|------------------|----------| | |
| Pass Rate | 0% | 75% | | |
| Average Steps | 1 | 4 | | |
| Maximum Steps | 1 | 13 | | |
| Minimum Steps | 1 | 2 | | |
| Median Steps | 1 | 3 | | |
| Average Tokens | 428 | 68,574 | | |
| Median Tokens | 434 | 31,541 | | |
| Maximum Tokens | 463 | 363,655 | | |
| Minimum Tokens | 374 | 7,963 | | |