Rai
sr-rai
AI & ML interests
None yet
Recent Activity
upvoted
an
article
about 1 month ago
SmolLM3: smol, multilingual, long-context reasoner
posted
an
update
4 months ago
ExLlamaV3 is out. And it introduces EXL3 - a new SOTA quantization format!
"The conversion process is designed to be simple and efficient and requires only an input model (in HF format) and a target bitrate. By computing Hessians on the fly and thanks to a fused Viterbi kernel, the quantizer can convert a model in a single step, taking a couple of minutes for smaller models, up to a few hours for larger ones (70B+) (on a single RTX 4090 or equivalent GPU.)"
Repo: https://github.com/turboderp-org/exllamav3