Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
321 Bytes
Since the hash can be a bit random, several hash functions are used in practice
(determined by a n_rounds parameter) and then are averaged together.
Local attention
Longformer uses local attention: often, the local context (e.g., what are the two tokens to the
left and right?) is enough to take action for a given token.