Speeding up PyTorch inference by 87% on Apple with AI-generated Metal kernels