Sharing base model in GPU VRAM across multiple inference stack process [video]