Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,15 @@ tags:
|
|
13 |
- mlx
|
14 |
---
|
15 |
|
16 |
-
# bobig/DeepScaleR-1.5B-6.5bit
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
converted to MLX format from [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
|
20 |
using mlx-lm version **0.21.4**.
|
21 |
|
@@ -28,7 +34,7 @@ pip install mlx-lm
|
|
28 |
```python
|
29 |
from mlx_lm import load, generate
|
30 |
|
31 |
-
model, tokenizer = load("bobig/DeepScaleR-1.5B-6.5bit
|
32 |
|
33 |
prompt = "hello"
|
34 |
|
|
|
13 |
- mlx
|
14 |
---
|
15 |
|
16 |
+
# bobig/DeepScaleR-1.5B-6.5bit
|
17 |
|
18 |
+
This works as a draft model for speculative decoding in [LMstudio 3.10 beta](https://lmstudio.ai/docs/advanced/speculative-decoding)
|
19 |
+
|
20 |
+
Try it with: [mlx-community/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-4.5bit](https://huggingface.co/mlx-community/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-4.5bit)
|
21 |
+
|
22 |
+
you should see 30% faster TPS for math/code prompts even with "thinking"
|
23 |
+
|
24 |
+
The Model [bobig/DeepScaleR-1.5B-6.5bit](https://huggingface.co/bobig/DeepScaleR-1.5B-6.5bit) was
|
25 |
converted to MLX format from [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
|
26 |
using mlx-lm version **0.21.4**.
|
27 |
|
|
|
34 |
```python
|
35 |
from mlx_lm import load, generate
|
36 |
|
37 |
+
model, tokenizer = load("bobig/DeepScaleR-1.5B-6.5bit")
|
38 |
|
39 |
prompt = "hello"
|
40 |
|