"...git@developer.sourcefind.cn:chenpangpang/open-webui.git" did not exist on "c54fb922631f15347087444702060f1c907a2926"
Error (also in original) model, scaling only q matrix not qk.T dot product...
Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627) * Error in model, scaling only q matrix not qK.T dot product (qk.T/sqrt(dim_per_head)) As per Vaswani et al, 2017 p.4 Is torch.matmul(q, k.transpose(2, 3)) / math.sqrt(dim_per_head) not q / math.sqrt(dim_per_head) https://arxiv.org/pdf/1912.05372.pdf Error was in original FlauBERT repo and effectively scales queries but not values cf. https://github.com/getalp/Flaubert/pull/45/commits/6d176880ca3a1a8dfa2b76c97030bb51c5e917b8 * Update modeling_flaubert.py Update to https://github.com/huggingface/transformers/pull/21627 make fixup make repo_consistency * Update modeling_xlm.py * Update modeling_flaubert.py * Update modeling_xlm.py
Showing
Please register or sign in to comment