11 27

Lucie-Aimée Kaffee

frimelle

https://luciekaffee.github.io/

AI & ML interests

None yet

Recent Activity

reacted to meg's post with ❤️ about 8 hours ago

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation. Evaluation Dataset: https://huggingface.co/datasets/AI-companionship/INTIMA Paper: https://huggingface.co/datasets/AI-companionship/INTIMA/blob/main/Companionship_Benchmark.pdf Work from @giadap , @frimelle , @yjernite .

reacted to fdaudens's post with 🚀 about 8 hours ago

OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers. For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement. It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes. Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions: 1️⃣ openai/gpt-oss-20b - 2.0M 2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K 3️⃣ openai/gpt-oss-120b - 430K 4️⃣ unsloth/gpt-oss-20b-GGUF - 380K 5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version) Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀

posted an update about 8 hours ago

OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3. We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral? Although users on Reddit have been complaining that GPT-5 (o5) has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models. As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience. INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses. Work with @giadap and @yjernite Read the full paper: https://huggingface.co/datasets/AI-companionship/INTIMA/blob/main/Companionship_Benchmark.pdf Explore INTIMA: https://huggingface.co/datasets/AI-companionship/INTIMA

View all activity

Organizations

Posts 6

Post

OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3.

We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral?

Although users on Reddit have been complaining that GPT-5 (o5) has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models.

As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience.

INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses.

Work with @giadap and @yjernite
Read the full paper: AI-companionship/INTIMA
Explore INTIMA: AI-companionship/INTIMA

View all Posts

Articles 12

Article

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

View all Articles

Collections 1

Chat with an AI companion and log interactions

models 0

None public yet

datasets 2

frimelle/test-gated-dataset

Viewer • Updated Jun 13 • 54.6M • 3 • 1

frimelle/wiki-stance

Preview • Updated Oct 19, 2023 • 98

Lucie-Aimée Kaffee

AI & ML interests

Recent Activity

Organizations

Posts 6

Articles 12

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

Collections 1

legacy-datasets/wikipedia

copenlu/wiki-stance

wikimedia/wikipedia

Salesforce/wikitext

legacy-datasets/wikipedia

copenlu/wiki-stance

wikimedia/wikipedia

Salesforce/wikitext

Papers 3

spaces 1

BoundrAI

models 0

datasets 2

frimelle/test-gated-dataset

frimelle/wiki-stance

Lucie-Aimée Kaffee

AI & ML interests

Recent Activity

Organizations

Posts 6

Articles 12

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

Collections 1

Papers 3

spaces 1

BoundrAI

models 0

datasets 2 Sort: Recently updated

datasets 2