RL is more information inefficient than you thought