Guide: Using a Custom Fine-Tuned Model with bitnet.cpp
This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference.
Prerequisites
Before you begin, ensure you have the following prerequisites installed and configured:
- Python 3.9 or later
- CMake 3.22 or later
- A C++ compiler (e.g., clang, g++)
- The Hugging Face Hub CLI (
huggingface-cli)
Step 1: Download the Custom Model
In this guide, we will use the tuandunghcmut/BitNET-Summarization model as an example. This model was fine-tuned by tuandunghcmut for summarization tasks. We will download it and place it in a directory that the setup_env.py script can recognize.
huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T
This command downloads the model and places it in the models/BitNet-b1.58-2B-4T directory. This is a workaround to make the existing scripts recognize the custom model.
Step 2: Convert the Model to GGUF Format
The downloaded model is in the .safetensors format. We need to convert it to the GGUF format to be used with bitnet.cpp. We will use the convert-helper-bitnet.py script for this.
However, the script needs some modifications to work with this custom model.
Modifications to the Conversion Scripts
utils/convert-helper-bitnet.py: Add the--skip-unknownflag to thecmd_convertlist to ignore unknown tensor names.cmd_convert = [ sys.executable, str(convert_script), str(model_dir), "--vocab-type", "bpe", "--outtype", "f32", "--concurrency", "1", "--outfile", str(gguf_f32_output), "--skip-unknown" ]utils/convert-hf-to-gguf-bitnet.py:- Add the
BitNetForCausalLMarchitecture to the@Model.registerdecorator for theBitnetModelclass. - Change the
set_vocabmethod in theBitnetModelclass to use_set_vocab_gpt2().
@Model.register("BitNetForCausalLM", "BitnetForCausalLM") class BitnetModel(Model): model_arch = gguf.MODEL_ARCH.BITNET def set_vocab(self): self._set_vocab_gpt2()- Add the
Running the Conversion
After making these changes, run the conversion script:
python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T
This will create the ggml-model-i2s-bitnet.gguf file in the model directory.
Step 3: Compile bitnet.cpp
Now, we need to compile the C++ code. We will use the setup_env.py script for this. We will use the i2_s quantization type.
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
This command will compile the C++ code and create the necessary binaries.
Step 4: Run Inference
Finally, we can run inference with the converted model.
python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello"
This will load the model and generate a response to the prompt "Hello".
Build Environment
This project was built and compiled on a CPU-only machine with the following specifications:
- CPU: AMD EPYC 9754 128-Core Processor
- Memory: 251Gi
Fine-Tuning
The tuandunghcmut/BitNET-Summarization model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library.
- Downloads last month
- 1
We're not able to determine the quantization variants.