|
# SAM.cpp |
|
|
|
Inference of Meta's [Segment Anything Model](https://github.com/facebookresearch/segment-anything/) in pure C/C++ |
|
|
|
## Description |
|
|
|
The example currently supports only the [ViT-B SAM model checkpoint](https://huggingface.co/facebook/sam-vit-base). |
|
|
|
## Next steps |
|
|
|
- [X] Reduce memory usage by utilizing the new ggml-alloc |
|
- [X] Remove redundant graph nodes |
|
- [ ] Make inference faster |
|
- [X] Fix the difference in output masks compared to the PyTorch implementation |
|
- [X] Filter masks based on stability score |
|
- [ ] Add support for user input |
|
- [ ] Support F16 for heavy F32 ops |
|
- [ ] Test quantization |
|
- [X] Support bigger model checkpoints |
|
- [ ] GPU support |
|
|
|
## Quick start |
|
```bash |
|
git clone https://github.com/ggerganov/ggml |
|
cd ggml |
|
|
|
# Install Python dependencies |
|
python3 -m pip install -r requirements.txt |
|
|
|
# Convert PTH model to ggml |
|
python convert-pth-to-ggml.py examples/sam/sam_vit_b_01ec64.pth . 1 |
|
|
|
# Build ggml + examples |
|
mkdir build && cd build |
|
cmake .. && make -j4 |
|
|
|
# run inference |
|
./bin/sam -t 16 -i ../img.jpg -m examples/sam/ggml-model-f16.bin |
|
``` |
|
|
|
## Downloading and converting the model checkpoints |
|
|
|
You can download a [model checkpoint](https://github.com/facebookresearch/segment-anything/tree/main#model-checkpoints) and convert it to `ggml` format using the script `convert-pth-to-ggml.py`: |
|
|
|
``` |
|
# Convert PTH model to ggml |
|
python convert-pth-to-ggml.py examples/sam/sam_vit_b_01ec64.pth . 1 |
|
``` |
|
|
|
## Example output on M2 Ultra |
|
``` |
|
$ ▶ make -j sam && time ./bin/sam -t 8 -i img.jpg |
|
[ 28%] Built target common |
|
[ 71%] Built target ggml |
|
[100%] Built target sam |
|
main: seed = 1693224265 |
|
main: loaded image 'img.jpg' (680 x 453) |
|
sam_image_preprocess: scale = 0.664062 |
|
main: preprocessed image (1024 x 1024) |
|
sam_model_load: loading model from 'models/sam-vit-b/ggml-model-f16.bin' - please wait ... |
|
sam_model_load: n_enc_state = 768 |
|
sam_model_load: n_enc_layer = 12 |
|
sam_model_load: n_enc_head = 12 |
|
sam_model_load: n_enc_out_chans = 256 |
|
sam_model_load: n_pt_embd = 4 |
|
sam_model_load: ftype = 1 |
|
sam_model_load: qntvr = 0 |
|
operator(): ggml ctx size = 202.32 MB |
|
sam_model_load: ...................................... done |
|
sam_model_load: model size = 185.05 MB / num tensors = 304 |
|
embd_img |
|
dims: 64 64 256 1 f32 |
|
First & Last 10 elements: |
|
-0.05117 -0.06408 -0.07154 -0.06991 -0.07212 -0.07690 -0.07508 -0.07281 -0.07383 -0.06779 |
|
0.01589 0.01775 0.02250 0.01675 0.01766 0.01661 0.01811 0.02051 0.02103 0.03382 |
|
sum: 12736.272313 |
|
|
|
Skipping mask 0 with iou 0.705935 below threshold 0.880000 |
|
Skipping mask 1 with iou 0.762136 below threshold 0.880000 |
|
Mask 2: iou = 0.947081, stability_score = 0.955437, bbox (371, 436), (144, 168) |
|
|
|
|
|
main: load time = 51.28 ms |
|
main: total time = 2047.49 ms |
|
|
|
real 0m2.068s |
|
user 0m16.343s |
|
sys 0m0.214s |
|
``` |
|
|
|
Input point is (414.375, 162.796875) (currently hardcoded) |
|
|
|
Input image: |
|
|
|
 |
|
|
|
Output mask (mask_out_2.png in build folder): |
|
|
|
 |
|
|
|
## References |
|
|
|
- [ggml](https://github.com/ggerganov/ggml) |
|
- [SAM](https://segment-anything.com/) |
|
- [SAM demo](https://segment-anything.com/demo) |
|
|