llama-cpp-python 0.3.9 Prebuilt Wheel with CUDA Support for Windows
This repository provides a prebuilt Python wheel for llama-cpp-python (version 0.3.9) with NVIDIA CUDA support, for Windows 10/11 (x64) systems. This wheel enables GPU-accelerated inference for large language models (LLMs) using the llama.cpp
library, simplifying setup by eliminating the need to compile from source. The wheel is compatible with Python 3.10 and supports NVIDIA GPUs, including the latest Blackwell architecture.
Available Wheel
llama_cpp_python-0.3.9-cp310-cp310-win_amd64.whl
(Python 3.10, CUDA 12.8)
Compatibility
The prebuilt wheels are designed for NVIDIA Blackwell GPUs but have been tested and confirmed compatible with previous-generation NVIDIA GPUs, including:
- NVIDIA RTX 5090
- NVIDIA RTX 3090
Installation
To install the wheel, use the following command in your Python 3.10 environment:
pip install llama_cpp_python-0.3.9-cp310-cp310-win_amd64.whl
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support