bobig commited on
Commit
eb2c20c
·
verified ·
1 Parent(s): df766b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -13,9 +13,15 @@ tags:
13
  - mlx
14
  ---
15
 
16
- # bobig/DeepScaleR-1.5B-6.5bit-21.4
17
 
18
- The Model [bobig/DeepScaleR-1.5B-6.5bit-21.4](https://huggingface.co/bobig/DeepScaleR-1.5B-6.5bit-21.4) was
 
 
 
 
 
 
19
  converted to MLX format from [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
20
  using mlx-lm version **0.21.4**.
21
 
@@ -28,7 +34,7 @@ pip install mlx-lm
28
  ```python
29
  from mlx_lm import load, generate
30
 
31
- model, tokenizer = load("bobig/DeepScaleR-1.5B-6.5bit-21.4")
32
 
33
  prompt = "hello"
34
 
 
13
  - mlx
14
  ---
15
 
16
+ # bobig/DeepScaleR-1.5B-6.5bit
17
 
18
+ This works as a draft model for speculative decoding in [LMstudio 3.10 beta](https://lmstudio.ai/docs/advanced/speculative-decoding)
19
+
20
+ Try it with: [mlx-community/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-4.5bit](https://huggingface.co/mlx-community/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-4.5bit)
21
+
22
+ you should see 30% faster TPS for math/code prompts even with "thinking"
23
+
24
+ The Model [bobig/DeepScaleR-1.5B-6.5bit](https://huggingface.co/bobig/DeepScaleR-1.5B-6.5bit) was
25
  converted to MLX format from [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
26
  using mlx-lm version **0.21.4**.
27
 
 
34
  ```python
35
  from mlx_lm import load, generate
36
 
37
+ model, tokenizer = load("bobig/DeepScaleR-1.5B-6.5bit")
38
 
39
  prompt = "hello"
40