The math behind tiled v/s naive matrix multiplication in CUDA