self_attn.k_proj.bias are all 0 for all layers

#50

by DaleMeng - opened Aug 6

Aug 6

•

Appreciate for you great work!
I notice that gpt-oss models enable many bias weight, including the attention part, router part and mlp part.
and by printing the value of bias, I found that all value of self_attn.k_proj.bias are zeros for all layers, both gpt-oss-20b and gpt-oss-120b.
Wonder is that a normal behavior?

DaleMeng changed discussion status to closed Aug 6

DaleMeng changed discussion status to open Aug 6

Billhappi

Aug 6

yes, i got the same issues.

rzhao01

Sep 26

k_proj biases are supposed to be zero. The original OAI checkpoint has a fused QKV projection. The fused bias contains 5120 elements = (80 heads x 64 per head). The 80 heads = (64 q + 8 k + 8 v). The biases in the original checkpoint for the k heads are all zeros.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment