Openness increases the rate of improvement. Over the...
Openness increases the rate of improvement
Over the past week, Unsloth has been hard at work finding and fixing Gemma bugs. At first, Google showcased Gemma’s promising results however, many problems like discrepancies in loss values made us step in to help Gemma live up to its initial promise.
We've already pushed all the fixes in our free Colab notebooks but not elsewhere. Here are the bugs we found:
1. Must add <bos>
2. Paper typo? <end_of_turn>model
3. sqrt(3072)=55.4256 but bfloat16 is 55.5
4. Layernorm (w+1) should be done in float32
5. Keras mixed_bfloat16 RoPE is wrong
6. RoPE is sensitive to a*(1/x) vs a/x
7. RoPE should be float32 not bfloat16 (Fixed in Hugging Face 4.38.2)
8. GELU should be approx tanh not exact (Ongoing PR)
https://unsloth.ai/blog/gemma-bugs
Over the past week, Unsloth has been hard at work finding and fixing Gemma bugs. At first, Google showcased Gemma’s promising results however, many problems like discrepancies in loss values made us step in to help Gemma live up to its initial promise.
We've already pushed all the fixes in our free Colab notebooks but not elsewhere. Here are the bugs we found:
1. Must add <bos>
2. Paper typo? <end_of_turn>model
3. sqrt(3072)=55.4256 but bfloat16 is 55.5
4. Layernorm (w+1) should be done in float32
5. Keras mixed_bfloat16 RoPE is wrong
6. RoPE is sensitive to a*(1/x) vs a/x
7. RoPE should be float32 not bfloat16 (Fixed in Hugging Face 4.38.2)
8. GELU should be approx tanh not exact (Ongoing PR)
https://unsloth.ai/blog/gemma-bugs
Источник: gonzo-обзоры ML статей
2024-03-07 08:08:28