Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR