Evaluation datasets

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

hynky updated a dataset 22 days ago

lighteval/treb_table_retrieval

hynky updated a dataset 23 days ago

lighteval/squad_v2

hynky published a dataset 23 days ago

lighteval/squad_v2

View all activity

albertvillanova

posted an update about 22 hours ago

Post

1370

Latest smolagents release supports GPT-5: build agents that think, plan, and act.
⚡ Upgrade now and put GPT-5 to work!

albertvillanova

posted an update 2 days ago

Post

217

🚀 smolagents v1.21.0 is here!
Now with improved safety in the local Python executor: dunder calls are blocked!
⚠️ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm.
✨ Many bug fixes: more reliable code.
👉 https://github.com/huggingface/smolagents/releases/tag/v1.21.0

hynky

updated a dataset 22 days ago

lighteval/treb_table_retrieval

Viewer • Updated 22 days ago • 500 • 623

hynky

updated a dataset 23 days ago

lighteval/squad_v2

Viewer • Updated 23 days ago • 142k • 214

hynky

published a dataset 23 days ago

lighteval/squad_v2

Viewer • Updated 23 days ago • 142k • 214

hynky

updated a dataset 23 days ago

lighteval/wikitablequestions

Viewer • Updated 23 days ago • 18.5k • 794

hynky

published 2 datasets 23 days ago

lighteval/wikitablequestions

Viewer • Updated 23 days ago • 18.5k • 794

lighteval/treb_table_retrieval

Viewer • Updated 22 days ago • 500 • 623

albertvillanova

posted an update about 1 month ago

Post

591

🚀 New in smolagents v1.20.0: Remote Python Execution via WebAssembly (Wasm)

We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!

🔧 Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.

Why this matters:
✅ Isolated execution = no host access
✅ No need for Python on the user's machine
✅ Safer evaluation of arbitrary code
✅ Compatible with serverless / edge agent workloads
✅ Ideal for constrained or untrusted environments

This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. 💡

💡 We're inviting the open-source community to help evolve this executor:
• Tackle more advanced Python features
• Expand compatibility
• Add test coverage
• Shape the next-gen secure agent runtime

🔗 Check out the PR: https://github.com/huggingface/smolagents/pull/1261

Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.

This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together 🧠⚙️

👉 https://github.com/huggingface/smolagents/releases/tag/v1.20.0

#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI