feat: Updated README
Browse files
README.md
CHANGED
|
@@ -1,4 +1,108 @@
|
|
| 1 |
---
|
| 2 |
pipeline_tag: reinforcement-learning
|
| 3 |
---
|
| 4 |
-
# mem-agent
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
pipeline_tag: reinforcement-learning
|
| 3 |
---
|
| 4 |
+
# mem-agent
|
| 5 |
+
|
| 6 |
+
Based on [Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), this model was trained using GSPO (Zheng et al., 2025) over an agent scaffold that is built around an Obisidian-like memory system and the tools required to interact with it. The model was trained on the following subtasks:
|
| 7 |
+
- Retrieval: Retrieving relevant information when needed from the memory system. In this subtask, we also trained the model on filtering the retrieved information and/or obfuscating it completely.
|
| 8 |
+
- Updating: Updating the memory system with new information.
|
| 9 |
+
- Clarification: Asking for clarification when the user query is not clear/contradicting with the information in the memory system.
|
| 10 |
+
|
| 11 |
+
The tools in the scaffold are:
|
| 12 |
+
```markdown
|
| 13 |
+
# File Operations
|
| 14 |
+
create_file(file_path: str, content: str = "") -> bool # Auto-creates parent directories
|
| 15 |
+
update_file(file_path: str, old_content: str, new_content: str) -> Union[bool, str] # Returns True or error message
|
| 16 |
+
read_file(file_path: str) -> str
|
| 17 |
+
delete_file(file_path: str) -> bool
|
| 18 |
+
check_if_file_exists(file_path: str) -> bool
|
| 19 |
+
|
| 20 |
+
# Directory Operations
|
| 21 |
+
create_dir(dir_path: str) -> bool
|
| 22 |
+
list_files() -> str # Shows tree structure of current working directory
|
| 23 |
+
check_if_dir_exists(dir_path: str) -> bool
|
| 24 |
+
|
| 25 |
+
# Utilities
|
| 26 |
+
get_size(file_or_dir_path: str) -> int # Bytes; empty = total memory size
|
| 27 |
+
go_to_link(link_string: str) -> bool
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
The model uses <think>, <python> and <reply> tags to structure its response. Using <reply> only when it's done interacting with the memory. The <python> block is executed in a sandbox with the tools and the results of the code block are returned in a <result> tag to the model, forming the agentic loop.
|
| 31 |
+
|
| 32 |
+
The model is also trained to be able to handle optional filters given by the user in between <filter> tags after the user query. These filters are used to filter the retrieved information and/or obfuscate it completely.
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
## Benchmark
|
| 36 |
+
|
| 37 |
+
We evaluated this model and a few other open & closed ones on our benchmark, **md-memory-bench**. We used o3 from OpenAI as the judge. All the other models except driaforall/mem-agent and Qwen/Qwen3-4B-Thinking-2507 were used through OpenRouter.s
|
| 38 |
+
|
| 39 |
+
| Model | Retrieval | Update | Clarification | Filter | Overall |
|
| 40 |
+
|-------|-----------|--------|---------------|--------|---------|
|
| 41 |
+
| qwen/qwen3-235b-a22b-thinking-2507 | 0.9091 | 0.6363 | 0.4545 | 1 | 0.7857 |
|
| 42 |
+
| driaforall/mem-agent | 0.8636 | 0.7272 | 0.3636 | 0.9167 | 0.75 |
|
| 43 |
+
| z-ai/glm-4.5 | 0.7727 | 0.8181 | 0.3636 | 0.9167 | 0.7321 |
|
| 44 |
+
| deepseek/deepseek-chat-v3.1 | 0.6818 | 0.5454 | 0.5454 | 0.8333 | 0.6607 |
|
| 45 |
+
| google/gemini-2.5-pro | 0.7273 | 0.4545 | 0.2727 | 1 | 0.6429 |
|
| 46 |
+
| google/gemini-2.5-flash | 0.7727 | 0.3636 | 0.2727 | 0.9167 | 0.625 |
|
| 47 |
+
| openai/gpt-5 | 0.6818 | 0.5454 | 0.2727 | 0.9167 | 0.625 |
|
| 48 |
+
| anthropic/claude-opus-4.1 | 0.6818 | 0 | 0.8181 | 0.5833 | 0.5536 |
|
| 49 |
+
| Qwen/Qwen3-4B-Thinking-2507 | 0.4545 | 0 | 0.2727 | 0.75 | 0.3929 |
|
| 50 |
+
| moonshotai/kimi-k2 | 0.3181 | 0.2727 | 0.1818 | 0.6667 | 0.3571 |
|
| 51 |
+
|
| 52 |
+
Our model, with only 4B parameters, is only second on the benchmark, beating all the open & closed models except for qwen/qwen3-235b-a22b-thinking-2507. The model achieves an overall score of 0.75, a significant improvement over the 0.3929 of the base Qwen model.
|
| 53 |
+
|
| 54 |
+
## Usage
|
| 55 |
+
|
| 56 |
+
The model, while can be used on its own, is recommended to be used as an MCP server to a bigger model, which can then be used to interact with the memory system. For this, you can check [our repo](https://huggingface.co/driaforall/mem-agent-mcp), which contains instructions for both an MCP setup and a cli standalone model usage.
|
| 57 |
+
|
| 58 |
+
### Memory
|
| 59 |
+
|
| 60 |
+
The model uses a markdown based memory system with links, inspired by Obsidian. The general structure of the memory is:
|
| 61 |
+
```
|
| 62 |
+
memory/
|
| 63 |
+
βββ user.md
|
| 64 |
+
βββ entities/
|
| 65 |
+
βββ [entity_name_1].md
|
| 66 |
+
βββ [entity_name_2].md
|
| 67 |
+
βββ ...
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
- `user.md` is the main file that contains information about the user and their relationships, accompanied by links to the enity file in the format of `[[entities/[entity_name].md]]` per relationship. The link format should be followed strictly.
|
| 71 |
+
- `entities/` is the directory that contains the entity files.
|
| 72 |
+
- Each entity file follows the same structure as `user.md`.
|
| 73 |
+
- Modifying the memory manually does not require restarting the MCP server.
|
| 74 |
+
|
| 75 |
+
### Example user.md
|
| 76 |
+
|
| 77 |
+
```markdown
|
| 78 |
+
# User Information
|
| 79 |
+
- user_name: John Doe
|
| 80 |
+
- birth_date: 1990-01-01
|
| 81 |
+
- birth_location: New York, USA
|
| 82 |
+
- living_location: Enschede, Netherlands
|
| 83 |
+
- zodiac_sign: Aquarius
|
| 84 |
+
|
| 85 |
+
## User Relationships
|
| 86 |
+
- company: [[entities/acme_corp.md]]
|
| 87 |
+
- mother: [[entities/jane_doe.md]]
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
### Example entity files (jane_doe.md and acme_corp.md)
|
| 91 |
+
|
| 92 |
+
```markdown
|
| 93 |
+
# Jane Doe
|
| 94 |
+
- relationship: Mother
|
| 95 |
+
- birth_date: 1965-01-01
|
| 96 |
+
- birth_location: New York, USA
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
```markdown
|
| 100 |
+
# Acme Corporation
|
| 101 |
+
- industry: Software Development
|
| 102 |
+
- location: Enschede, Netherlands
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
The model is trained on this memory standard and any fruitful use should be on a memory system that follows this standard. We have a few memory export tools for different sources like ChatGPT, Notion, etc. in our mcp server repo.
|
| 106 |
+
|
| 107 |
+
## References:
|
| 108 |
+
- [GSPO](https://arxiv.org/pdf/2507.18071), Zheng et al., 2025
|