Bug in attention map computation

#3
by gionii - opened

In the following line: https://huggingface.co/Synthyra/ESMplusplus_small/blob/main/modeling_esm_plusplus.py#L324, you are updating attention_mask rather than attn_bias which is actually used to mask attention values.

I am assuming you followed this template https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

Synthyra org

Hi @gionii ,

Thanks for pointing this out! We have fixed the typo.

If you have any other questions or comments please let me know.
Best,
Logan

Sign up or log in to comment