Understanding RL for model training, and future directions with GRAPE