mlx-community
/

DeepScaleR-1.5B-6.5bit

Text Generation

text-generation-inference

Model card Files Files and versions

bobig commited on Feb 17

Commit

eb2c20c

·

verified ·

1 Parent(s): df766b3

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -13,9 +13,15 @@ tags:
 - mlx
 ---
-# bobig/DeepScaleR-1.5B-6.5bit-21.4
-The Model [bobig/DeepScaleR-1.5B-6.5bit-21.4](https://huggingface.co/bobig/DeepScaleR-1.5B-6.5bit-21.4) was
 converted to MLX format from [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
 using mlx-lm version **0.21.4**.
@@ -28,7 +34,7 @@ pip install mlx-lm
 ```python
 from mlx_lm import load, generate
-model, tokenizer = load("bobig/DeepScaleR-1.5B-6.5bit-21.4")
 prompt = "hello"

 - mlx
 ---
+# bobig/DeepScaleR-1.5B-6.5bit
+This works as a draft model for speculative decoding in [LMstudio 3.10 beta](https://lmstudio.ai/docs/advanced/speculative-decoding)
+Try it with: [mlx-community/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-4.5bit](https://huggingface.co/mlx-community/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-4.5bit)
+you should see 30% faster TPS for math/code prompts even with "thinking"
+The Model [bobig/DeepScaleR-1.5B-6.5bit](https://huggingface.co/bobig/DeepScaleR-1.5B-6.5bit) was
 converted to MLX format from [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
 using mlx-lm version **0.21.4**.
 ```python
 from mlx_lm import load, generate
+model, tokenizer = load("bobig/DeepScaleR-1.5B-6.5bit")
 prompt = "hello"