Unweaving warp specialization on modern tensor core GPUs