Fp16 Training Pytorch, 8 (and NCCL >= 2.