Entropy collapse occurs in GRPO when clipping disproportionately constrains exploration tokens (low probability) compared to exploitation tokens (high probability). Even though both are clipped by the same relative factor (1 + ε), the absolute constraint is much more restrictive for exploration tokens.
Parameters
Comparison: Symmetric [1 - ε, 1 + ε] vs Asymmetric [1 - ε, 1 + εhigh]
Symmetric Clipping
[1 - ε, 1 + ε]
Max Allowed
0.012
Increase
+0.002
Asymmetric Clipping
[1 - ε, 1 + εhigh]
Max Allowed
0.0128
Increase
+0.0028
Improvement
+40%
Symmetric Clipping
[1 - ε, 1 + ε]
Max Allowed
1.08
Increase
+0.18
Asymmetric Clipping
[1 - ε, 1 + εhigh]
Max Allowed
1.152
Increase
+0.252
Improvement
+40%
Key Insight: With symmetric clipping, exploitation can increase 90× more than exploration. With asymmetric clipping (εhigh = 0.28), this ratio improves to 64×, giving exploration tokens +40% more headroom to increase probability.