llama-cpp-python 0.3.9 Prebuilt Wheel with CUDA Support for Windows

This repository provides a prebuilt Python wheel for llama-cpp-python (version 0.3.9) with NVIDIA CUDA support, for Windows 10/11 (x64) systems. This wheel enables GPU-accelerated inference for large language models (LLMs) using the llama.cpp library, simplifying setup by eliminating the need to compile from source. The wheel is compatible with Python 3.10 and supports NVIDIA GPUs, including the latest Blackwell architecture.

Available Wheel

  • llama_cpp_python-0.3.9-cp310-cp310-win_amd64.whl (Python 3.10, CUDA 12.8)

Compatibility

The prebuilt wheels are designed for NVIDIA Blackwell GPUs but have been tested and confirmed compatible with previous-generation NVIDIA GPUs, including:

  • NVIDIA RTX 5090
  • NVIDIA RTX 3090

Installation

To install the wheel, use the following command in your Python 3.10 environment:

pip install llama_cpp_python-0.3.9-cp310-cp310-win_amd64.whl
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support