Installation Guide for Ascend NPU#

Xinference can run on Ascend NPU, follow below instructions to install.

Warning

The open-source version relies on Transformers for inference, which can be slow on chips like 310p3. We provide an enterprise version that supports the MindIE engine, offering better performance and compatibility for Ascend NPU. Refer to Xinference Enterprise

Installing PyTorch and Ascend extension for PyTorch#

Install PyTorch CPU version and corresponding Ascend extension.

Take PyTorch v2.1.0 as example.

pip3 install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cpu

Then install Ascend extension for PyTorch.

pip3 install 'numpy<2.0'
pip3 install decorator
pip3 install torch-npu==2.1.0.post3

Running below command to see if it correctly prints the Ascend NPU count.

python -c "import torch; import torch_npu; print(torch.npu.device_count())"

Installing Xinference#

pip3 install xinference

Now you can use xinference according to doc. Transformers backend is the only available engine supported for Ascend NPU for open source version.

Enterprise Support#

If you encounter any performance or other issues for Ascend NPU, please reach out to us via link.