Upload folder using huggingface_hub

13d3ba0 over 1 year ago

4.82 kB

	# Simple autogenerated Python bindings for ggml

	This folder contains:

	- Scripts to generate full Python bindings from ggml headers (+ stubs for autocompletion in IDEs)
	- Some barebones utils (see [ggml/utils.py](./ggml/utils.py)):
	- `ggml.utils.init` builds a context that's freed automatically when the pointer gets GC'd
	- `ggml.utils.copy` copies between same-shaped tensors (numpy or ggml), w/ automatic (de/re)quantization
	- `ggml.utils.numpy` returns a numpy view over a ggml tensor; if it's quantized, it returns a copy (requires `allow_copy=True`)
	- Very basic examples (anyone wants to port [llama2.c](https://github.com/karpathy/llama2.c)?)

	Provided you set `GGML_LIBRARY=.../path/to/libggml_shared.so` (see instructions below), it's trivial to do some operations on quantized tensors:

	```python
	# Make sure libllama.so is in your [DY]LD_LIBRARY_PATH, or set GGML_LIBRARY=.../libggml_shared.so

	from ggml import lib, ffi
	from ggml.utils import init, copy, numpy
	import numpy as np

	ctx = init(mem_size=1210241024)
	n = 256
	n_threads = 4

	a = lib.ggml_new_tensor_1d(ctx, lib.GGML_TYPE_Q5_K, n)
	b = lib.ggml_new_tensor_1d(ctx, lib.GGML_TYPE_F32, n) # Can't both be quantized
	sum = lib.ggml_add(ctx, a, b) # all zeroes for now. Will be quantized too!

	gf = ffi.new('struct ggml_cgraph*')
	lib.ggml_build_forward_expand(gf, sum)

	copy(np.array([i for i in range(n)], np.float32), a)
	copy(np.array([i*100 for i in range(n)], np.float32), b)

	lib.ggml_graph_compute_with_ctx(ctx, gf, n_threads)

	print(numpy(a, allow_copy=True))
	# 0. 1.0439453 2.0878906 3.131836 4.1757812 5.2197266. ...
	print(numpy(b))
	# 0. 100. 200. 300. 400. 500. ...
	print(numpy(sum, allow_copy=True))
	# 0. 105.4375 210.875 316.3125 421.75 527.1875 ...
	```

	### Prerequisites

	You'll need a shared library of ggml to use the bindings.

	#### Build libggml_shared.so or libllama.so

	As of this writing the best is to use [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)'s generated `libggml_shared.so` or `libllama.so`, which you can build as follows:

	```bash
	git clone https://github.com/ggerganov/llama.cpp
	# On a CUDA-enabled system add -DLLAMA_CUBLAS=1
	# On a Mac add -DLLAMA_METAL=1
	cmake llama.cpp \
	-B llama_build \
	-DCMAKE_C_FLAGS=-Ofast \
	-DLLAMA_NATIVE=1 \
	-DLLAMA_LTO=1 \
	-DBUILD_SHARED_LIBS=1 \
	-DLLAMA_MPI=1 \
	-DLLAMA_BUILD_TESTS=0 \
	-DLLAMA_BUILD_EXAMPLES=0
	( cd llama_build && make -j )

	# On Mac, this will be libggml_shared.dylib instead
	export GGML_LIBRARY=$PWD/llama_build/libggml_shared.so
	# Alternatively, you can just copy it to your system's lib dir, e.g /usr/local/lib
	```

	#### (Optional) Regenerate the bindings and stubs

	If you added or changed any signatures of the C API, you'll want to regenerate the bindings ([ggml/cffi.py](./ggml/cffi.py)) and stubs ([ggml/__init__.pyi](./ggml/__init__.pyi)).

	Luckily it's a one-liner using [regenerate.py](./regenerate.py):

	```bash
	pip install -q cffi

	python regenerate.py
	```

	By default it assumes `llama.cpp` was cloned in ../../../llama.cpp (alongside the ggml folder). You can override this with:

	```bash
	C_INCLUDE_DIR=$LLAMA_CPP_DIR python regenerate.py
	```

	You can also edit [api.h](./api.h) to control which files should be included in the generated bindings (defaults to `llama.cpp/ggml*.h`)

	In fact, if you wanted to only generate bindings for the current version of the `ggml` repo itself (instead of `llama.cpp`; you'd loose support for k-quants), you could run:

	```bash
	API=../../include/ggml/ggml.h python regenerate.py
	```

	## Develop

	Run tests:

	```bash
	pytest
	```

	### Alternatives

	This example's goal is to showcase [cffi](https://cffi.readthedocs.io/)-generated bindings that are trivial to use and update, but there are already alternatives in the wild:

	- https://github.com/abetlen/ggml-python: these bindings seem to be hand-written and use [ctypes](https://docs.python.org/3/library/ctypes.html). It has [high-quality API reference docs](https://ggml-python.readthedocs.io/en/latest/api-reference/#ggml.ggml) that can be used with these bindings too, but it doesn't expose Metal, CUDA, MPI or OpenCL calls, doesn't support transparent (de/re)quantization like this example does (see [ggml.utils](./ggml/utils.py) module), and won't pick up your local changes.

	- https://github.com/abetlen/llama-cpp-python: these expose the C++ `llama.cpp` interface, which this example cannot easily be extended to support (`cffi` only generates bindings of C libraries)

	- [pybind11](https://github.com/pybind/pybind11) and [nanobind](https://github.com/wjakob/nanobind) are two alternatives to cffi that support binding C++ libraries, but it doesn't seem either of them have an automatic generator (writing bindings is rather time-consuming).