PowerInfer
/

SmallThinker-4BA0.6B-Instruct

Text Generation

feature-extraction

Model card Files Files and versions

wdl339 commited on Jul 27

Commit

ad742d6

·

verified ·

1 Parent(s): c487e66

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -24,6 +24,18 @@ without relying on the cloud.
 For the MMLU evaluation, we use a 0-shot CoT setting.
 ## Model Card
 <div align="center">

 For the MMLU evaluation, we use a 0-shot CoT setting.
+## Speed
+| Model                                         | Memory(GiB)         | i9 14900 | 1+13 8gen4 | rk3588 (16G) | rk3576 | Raspberry PI 5 | RDK X5 | rk3566 |
+|-----------------------------------------------|---------------------|----------|------------|--------------|--------|----------------|--------|--------|
+| SmallThinker 4B+sparse ffn +sparse lm_head    | 2.24                | 108.17   | 78.99      | 39.76        | 15.10  | 28.77          | 7.23   | 6.33   |
+| SmallThinker 4B+sparse ffn +sparse lm_head+limited memory | limit 1G| 29.99    | 20.91      | 15.04        | 2.60   | 0.75           | 0.67   | 0.74   |
+| Qwen3 0.6B                                    | 0.6                 | 148.56   | 94.91      | 45.93        | 15.29  | 27.44          | 13.32  | 9.76   |
+| Qwen3 1.7B                                    | 1.3                 | 62.24    | 41.00      | 20.29        | 6.09   | 11.08          | 6.35   | 4.15   |
+| Qwen3 1.7B+limited memory                     | limit 1G            | 2.66     | 1.09       | 1.00         | 0.47   | -              | -      | 0.11   |
+| Gemma3n E2B                                   | 1G, theoretically   | 36.88    | 27.06      | 12.50        | 3.80   | 6.66           | 3.46   | 2.45   |
+Note：i9 14900、1+13 8ge4 use 4 threads，others use the number of threads that can achieve the maximum speed.
 ## Model Card
 <div align="center">