tr1-13B-checkpoints / README.md
stas's picture
Update README.md
2658248

160 intermediary checkpoints from the tr1-13B training

these models have a bug in them. While we are fixing things if you try to use any of these please run it through this script:

python -c '
import sys, torch
f=sys.argv[1]
sd=torch.load(f)
d=2048
for k in sd.keys():
    if k.endswith(".attn.bias"):
        sd[k] = torch.tril(torch.ones((d, d), dtype=torch.float16)).view(1, 1, d, d)
torch.save(sd, f)
' global_step594/pytorch_model.bin