Post
27

We ran Feather DB v0.8.0 on LongMemEval (ICLR 2025) — 500 questions across real multi-session conversations, up to 115K tokens each.
**Score: 0.693** · GPT-4o full-context baseline: 0.640
Full 500-question run with Gemini-Flash: **$2.40**
Per-axis breakdown:
→ Info-extraction: **0.942**
→ Knowledge-update: **0.714**
→ Multi-session: **0.606**
→ Temporal: **0.477** ← the hard one, Phase 9 addresses this
Architecture: Hybrid BM25+dense · adaptive temporal decay · embedded (no server) · p50 = 0.19ms · MIT
Raw results + audit JSONs: Hawky-ai/longmemeval-results
We ran Feather DB v0.8.0 on LongMemEval (ICLR 2025) — 500 questions across real multi-session conversations, up to 115K tokens each.
**Score: 0.693** · GPT-4o full-context baseline: 0.640
Full 500-question run with Gemini-Flash: **$2.40**
Per-axis breakdown:
→ Info-extraction: **0.942**
→ Knowledge-update: **0.714**
→ Multi-session: **0.606**
→ Temporal: **0.477** ← the hard one, Phase 9 addresses this
Architecture: Hybrid BM25+dense · adaptive temporal decay · embedded (no server) · p50 = 0.19ms · MIT
pip install feather-dbRaw results + audit JSONs: Hawky-ai/longmemeval-results