5 9

Adam Filipek

AdamF92

https://rxai.dev

AI & ML interests

Research: AGI, Awareness Models, Memory Systems, Transformers, Reactive Neural Networks & Event-driven AI

Recent Activity

updated a dataset 5 days ago

ReactiveAI/Wild-Chat-MRL

updated a dataset 5 days ago

ReactiveAI/Wild-Chat-MRL

updated a dataset 5 days ago

ReactiveAI/Wild-Chat-MRL

View all activity

Organizations

Posts 2

Post

168

TensorBLEU - GPU-based vectorized BLEU score for in-training optimization

Today I published my next paper, that's introducing TensorBLEU - TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation (2510.05485), the optimization dedicated for Reinforcement Learning rewards based BLEU score. It achieved over 10x speed improvement over NLTK's version on small T4 GPU and even 40x improvement with A100 GPU.

That's not exactly linguistically correct BLEU, because it's not based on text n-grams, but on token ids. It's a conscious choice to skip computationally expensive token decoding, in case when it serves only as a reward signal. It was previously possible with NLTK's sentence_bleu, but required moving tensors with token ids from GPU to CPU, converting them to lists and calculating in python loop, creating significant performance bottlenecks.

In our case, in Reactive AI (

ReactiveAI ) we are using BLEU as a part of the reward in Memory Reinforcement Learning (MRL) of Reactive Transformer models ( Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models (2510.03561)), combined with cosine similarity. To rate memory quality, we calculate BLEU and cosine similarity between generated answer and reference answer from dataset, as well as between generated answer and previous interaction(s), to ensure that current answer includes some information from previous time-steps. Cosine similarity is calculated on GPU, but BLEU calculation with NLTK have to be performed on CPU, with a lot of data moving and conversion. When all the episode (generating batch of answers, memory updates and reward calculation) takes i.e. 6 seconds, even 0.5s for the reward is noticeable, so we decided to optimize it.

TensorBLEU calculation is performed on GPU for all the batch on sentence or corpus level - tensor_sentence_bleu or tensor_corpus_bleu from rxlm.metrics.tensorbleu (https://github.com/RxAI-dev/rxlm)

Please check the paper and upvote it, if you like it :)

View all Posts

Articles 1

Article

models 0

None public yet

datasets 0

None public yet

Adam Filipek

AI & ML interests

Recent Activity

Organizations

Posts 2

Articles 1

Reactive Transformer (RxT): Fixing the Memory Problem in Conversational AI

Collections 3

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

ReactiveAI/RxT-Alpha-Supervised

ReactiveAI/RxT-Alpha-Mini-Supervised

ReactiveAI/RxT-Alpha-Micro-Supervised

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

ReactiveAI/sSQAT-mm

ReactiveAI/SQAT-mm

ReactiveAI/xSQAT-mm

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

ReactiveAI/RxT-Alpha-Supervised

ReactiveAI/RxT-Alpha-Mini-Supervised

ReactiveAI/RxT-Alpha-Micro-Supervised

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

ReactiveAI/sSQAT-mm

ReactiveAI/SQAT-mm

ReactiveAI/xSQAT-mm

Papers 3

models 0

datasets 0