Why Symmetric Clipping Hurts Exploration

Understanding the asymmetric impact of clipping on exploration vs exploitation tokens in GRPO

Entropy collapse occurs in GRPO when clipping disproportionately constrains exploration tokens (low probability) compared to exploitation tokens (high probability). Even though both are clipped by the same relative factor (1 + ε), the absolute constraint is much more restrictive for exploration tokens.

Parameters

Comparison: Symmetric [1 - ε, 1 + ε] vs Asymmetric [1 - ε, 1 + εhigh]

Exploration Token πold = 0.01
Symmetric Clipping [1 - ε, 1 + ε]
Max Allowed 0.012
Increase +0.002
Asymmetric Clipping [1 - ε, 1 + εhigh]
Max Allowed 0.0128
Increase +0.0028
Improvement +40%
Exploitation Token πold = 0.9
Symmetric Clipping [1 - ε, 1 + ε]
Max Allowed 1.08
Increase +0.18
Asymmetric Clipping [1 - ε, 1 + εhigh]
Max Allowed 1.152
Increase +0.252
Improvement +40%
Key Insight: With symmetric clipping, exploitation can increase 90× more than exploration. With asymmetric clipping (εhigh = 0.28), this ratio improves to 64×, giving exploration tokens +40% more headroom to increase probability.

Reference

This visualization is based on insights from: GRPO Tricks by Cameron R. Wolfe