Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
412 Bytes
If you have very long texts, this matrix can be huge and take way too much space on the GPU. To alleviate
that, axial positional encodings consist of factorizing that big matrix E in two smaller matrices E1 and E2, with
dimensions \(l_{1} \times d_{1}\) and \(l_{2} \times d_{2}\), such that \(l_{1} \times l_{2} = l\) and
\(d_{1} + d_{2} = d\) (with the product for the lengths, this ends up being way smaller).