Installation Guide for Ascend NPU#
Xinference can run on Ascend NPU, follow below instructions to install.
Warning
The open-source version relies on Transformers for inference, which can be slow on chips like 310p3. We provide an enterprise version that supports the MindIE engine, offering better performance and compatibility for Ascend NPU. Refer to Xinference Enterprise
Installing PyTorch and Ascend extension for PyTorch#
Install PyTorch CPU version and corresponding Ascend extension.
Take PyTorch v2.1.0 as example.
pip3 install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cpu
Then install Ascend extension for PyTorch.
pip3 install 'numpy<2.0' pip3 install decorator pip3 install torch-npu==2.1.0.post3
Running below command to see if it correctly prints the Ascend NPU count.
python -c "import torch; import torch_npu; print(torch.npu.device_count())"
Installing Xinference#
pip3 install xinference
Now you can use xinference according to doc.
Transformers
backend is the only available engine supported for Ascend NPU for open source version.
Enterprise Support#
If you encounter any performance or other issues for Ascend NPU, please reach out to us via link.