llama-cpp-python 0.3.9 Prebuilt Wheel with CUDA Support for Windows

This repository provides a prebuilt Python wheel for llama-cpp-python (version 0.3.9) with NVIDIA CUDA support, for Windows 10/11 (x64) systems. This wheel enables GPU-accelerated inference for large language models (LLMs) using the llama.cpp library, simplifying setup by eliminating the need to compile from source. The wheel is compatible with Python 3.10 and supports NVIDIA GPUs, including the latest Blackwell architecture.

Available Wheel

llama_cpp_python-0.3.9-cp310-cp310-win_amd64.whl (Python 3.10, CUDA 12.8)

Compatibility

The prebuilt wheels are designed for NVIDIA Blackwell GPUs but have been tested and confirmed compatible with previous-generation NVIDIA GPUs, including:

NVIDIA RTX 5090
NVIDIA RTX 3090

Installation

To install the wheel, use the following command in your Python 3.10 environment:

pip install llama_cpp_python-0.3.9-cp310-cp310-win_amd64.whl