Announcing RealPerformance, a dataset of functional issues of language models that mirrors failure patterns identified through rigorous testing in real LLM agents
🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!
Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.
RealHarm: A Collection of Real-World Language Model Application Failure
I'm David from Giskard, and we work on securing your Agents. Today, we are launching RealHarm: a dataset of real-world problematic interactions with AI agents, drawn from publicly reported incidents.
In this unit, you'll learn: - Offline Evaluation – Benchmark and iterate your agent using datasets. - Online Evaluation – Continuously track key metrics such as latency, costs, and user feedback.
We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at
- What makes llama-index stand-out - How the LlamaHub is used for integrations - Creating QueryEngine components - Using agents and tools - Agentic and multi-agent workflows
The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.