Post
111
Incredible work!! They claim this is the year of recursive language models (I hope so). As models get bigger and better managing their context windows to fit longer prompts has been a standing engineering problem. They propose an inference technique that allows the model to externally crunch down long prompts into snippets that it can recursively call itself on, instead of directly feeding the entire prompt into the transformer. This could make models cheaper and more efficient but I doubt if big tech will adopt it since they profit more with the current approach (bigger models = longer context windows = more expensive the model). Once again such work came from academia/oss community cuz I doubt big tech would have shared these findings lol. They probably have much better inference methods that we may never know of haha.
Paper: https://arxiv.org/pdf/2512.24601
Paper: https://arxiv.org/pdf/2512.24601