๐ Optimum: The Last v1 Release ๐ Optimum v1.27 marks the final major release in the v1 series. As we close this chapter, we're laying the groundwork for a more modular and community-driven future: - Optimum v2: A lightweight core package for porting Transformers, Diffusers, or Sentence-Transformers to specialized AI hardware/software/accelerators.. - OptimumโONNX: A dedicated package where the ONNX/ONNX Runtime ecosystem lives and evolves, faster-moving and decoupled from the Optimum core.
๐ฏ Why this matters: - A clearer governance path for ONNX, fostering stronger community collaboration and improved developer experience.. - Enable innovation at a faster pace in a more modular, open-source environment.
๐ก What this means: - More transparency, broader participation, and faster development driven by the community and key actors in the ONNX ecosystem (PyTorch, Microsoft, Joshua Lochner ๐, ...) - A cleaner, more maintainable core Optimum, focused on extending HF libraries to special AI hardware/software/accelerators tooling and used by our partners (Intel Corporation, Amazon Web Services (AWS), AMD, NVIDIA, FuriosaAI, ...)
๐ ๏ธ Major updates I worked on in this release: โ Added support for Transformers v4.53 and SmolLM3 in ONNX/ONNXRuntime. โ Solved batched inference/generation for all supported decoder model architectures (LLMs).
โจ Big shoutout to @echarlaix for leading the refactoring work that cleanly separated ONNX exporter logic and enabled the creation of OptimumโONNX.
The conclusion is interesting: "Our findings highlight that the Gaudi 2, by leveraging FP8, achieves higher throughput-to-power efficiency during LLM inference"
One aspect of AI hardware accelerators that is often overlooked is how they consume less energy than GPUs. It's nice to see researchers starting carrying out experiments to measure this!