manaestras commited on
Commit
f59be00
·
verified ·
1 Parent(s): 93d6b13

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  base_model:
3
- - tencent/Hunyuan-4B-Instruct
4
  library_name: transformers
5
  ---
6
 
@@ -14,7 +14,7 @@ library_name: transformers
14
 
15
  <p align="center">
16
  🤗&nbsp;<a href="https://huggingface.co/tencent/"><b>HuggingFace</b></a>&nbsp;|&nbsp;
17
- 🤖&nbsp;<a href="https://modelscope.cn/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct"><b>ModelScope</b></a>&nbsp;|&nbsp;
18
  🪡&nbsp;<a href="https://github.com/Tencent/AngelSlim/tree/main"><b>AngelSlim</b></a>
19
  </p>
20
 
@@ -25,10 +25,10 @@ library_name: transformers
25
  </p>
26
 
27
  <p align="center">
28
- <a href="https://github.com/Tencent-Hunyuan/Hunyuan-7B"><b>GITHUB</b></a> |
29
- <a href="https://cnb.cool/tencent/hunyuan/Hunyuan-7B"><b>cnb.cool</b></a> |
30
- <a href="https://github.com/Tencent-Hunyuan/Hunyuan-7B/blob/main/LICENSE"><b>LICENSE</b></a> |
31
- <a href="https://raw.githubusercontent.com/Tencent-Hunyuan/Hunyuan-A13B/main/assets/1751881231452.jpg"><b>WeChat</b></a> |
32
  <a href="https://discord.gg/bsPcMEtV7v"><b>Discord</b></a>
33
  </p>
34
 
@@ -52,7 +52,7 @@ We have released a series of Hunyuan dense models, comprising both pre-trained a
52
 
53
  ## Benchmark
54
 
55
- Note: The following benchmarks are evaluated by TRT-LLM-backend on several **base models**.
56
 
57
  | Model | Hunyuan-0.5B-Pretrain | Hunyuan-1.8B-Pretrain | Hunyuan-4B-Pretrain | Hunyuan-7B-Pretrain|
58
  |:------------------:|:---------------:|:--------------:|:-------------:|:---------------:|
@@ -90,7 +90,7 @@ First, please install transformers. We will merge it into the main branch later.
90
  ```SHELL
91
  pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
92
  ```
93
- Our model defaults to using slow-thinking reasoning, and there are two ways to disable CoT reasoning.
94
  1. Pass **"enable_thinking=False"** when calling apply_chat_template.
95
  2. Adding **"/no_think"** before the prompt will force the model not to use perform CoT reasoning. Similarly, adding **"/think"** before the prompt will force the model to perform CoT reasoning.
96
 
@@ -113,7 +113,7 @@ messages = [
113
  tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True,return_tensors="pt",
114
  enable_thinking=True # Toggle thinking mode (default: True)
115
  )
116
-
117
  outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
118
 
119
  output_text = tokenizer.decode(outputs[0])
@@ -274,7 +274,7 @@ We use FP8-static quantization, FP8 quantization adopts 8-bit floating point for
274
  ### Int4 Quantization
275
  We use the GPTQ and AWQ algorithm to achieve W4A16 quantization.
276
 
277
- GPTQ processes the model weights layer by layer, uses a small amount of calibration data to minimize the reconfiguration error of the quantized weights, and adjusts the weights layer by layer by the optimization process of approximating the Hessian inverse matrix. The process eliminates the need to retrain the model and requires only a small amount of calibration data to quantize the weights, improving inference efficiency and lowering the deployment threshold.
278
  AWQ using a small amount of calibration data (without the need for training), the amplitude of the activation values is statistically calculated. For each weight channel, a scaling coefficient s is computed to expand the numerical range of important weights, allowing more information to be retained during quantization.
279
 
280
  You can use [AngleSlim](https://github.com/tencent/AngelSlim) quantization, you can also directly download our quantization completed open source model to use [LINK](https://huggingface.co/).
@@ -296,19 +296,19 @@ This subsection describes the Benchmark metrics for the Hunyuan quantitative mod
296
 
297
  For deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.
298
 
299
- image: https://hub.docker.com/r/hunyuaninfer/hunyuan-7B/tags
300
 
301
 
302
  ### TensorRT-LLM
303
 
304
- #### Docker Image
305
 
306
  We provide a pre-built Docker image based on the latest version of TensorRT-LLM.
307
 
308
  We use tencent/Hunyuan-7B-Instruct for example
309
  - To get started:
310
 
311
- https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
312
 
313
  ```
314
  docker pull hunyuaninfer/hunyuan-7B:hunyuan-moe-7B-trtllm
@@ -359,14 +359,14 @@ trtllm-serve \
359
  Please use vLLM version v0.10.0 or higher for inference.
360
 
361
  We use tencent/Hunyuan-7B-Instruct for example
362
- - Download Model file:
363
  - Huggingface: will download automicly by vllm.
364
  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-7B-Instruct`
365
-
366
  - model download by huggingface:
367
  ```shell
368
  export MODEL_PATH=tencent/Hunyuan-7B-Instruct
369
- ```
370
 
371
  - model downloaded by modelscope:
372
  ```shell
@@ -386,7 +386,7 @@ python3 -m vllm.entrypoints.openai.api_server \
386
  --quantization experts_int8 \
387
  --served-model-name hunyuan \
388
  2>&1 | tee log_server.txt
389
- ```
390
  - After running service script successfully, run the request script
391
  ```shell
392
  curl http://0.0.0.0:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{
@@ -474,7 +474,7 @@ python3 -m vllm.entrypoints.openai.api_server \
474
 
475
  ### SGLang
476
 
477
- #### Docker Image
478
 
479
  We also provide a pre-built Docker image based on the latest version of SGLang.
480
 
@@ -504,4 +504,4 @@ docker run --entrypoint="python3" --gpus all \
504
 
505
  ## Contact Us
506
 
507
- If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email ([email protected]).
 
1
  ---
2
  base_model:
3
+ - tencent/Hunyuan-4B-Pretrain
4
  library_name: transformers
5
  ---
6
 
 
14
 
15
  <p align="center">
16
  🤗&nbsp;<a href="https://huggingface.co/tencent/"><b>HuggingFace</b></a>&nbsp;|&nbsp;
17
+ 🤖&nbsp;<a href="https://modelscope.cn/models/Tencent-Hunyuan/"><b>ModelScope</b></a>&nbsp;|&nbsp;
18
  🪡&nbsp;<a href="https://github.com/Tencent/AngelSlim/tree/main"><b>AngelSlim</b></a>
19
  </p>
20
 
 
25
  </p>
26
 
27
  <p align="center">
28
+ <a href="https://github.com/Tencent-Hunyuan/"><b>GITHUB</b></a> |
29
+ <a href="https://cnb.cool/tencent/hunyuan/"><b>cnb.cool</b></a> |
30
+ <a href="https://github.com/Tencent-Hunyuan/Hunyuan-4B/blob/main/LICENSE"><b>LICENSE</b></a> |
31
+ <a href="https://raw.githubusercontent.com/Tencent-Hunyuan/Hunyuan-A13B/main/assets/1751881231452.jpg"><b>WeChat</b></a> |
32
  <a href="https://discord.gg/bsPcMEtV7v"><b>Discord</b></a>
33
  </p>
34
 
 
52
 
53
  ## Benchmark
54
 
55
+ Note: The following benchmarks are evaluated by TRT-LLM-backend on several **base models**.
56
 
57
  | Model | Hunyuan-0.5B-Pretrain | Hunyuan-1.8B-Pretrain | Hunyuan-4B-Pretrain | Hunyuan-7B-Pretrain|
58
  |:------------------:|:---------------:|:--------------:|:-------------:|:---------------:|
 
90
  ```SHELL
91
  pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
92
  ```
93
+ Our model defaults to using slow-thinking reasoning, and there are two ways to disable CoT reasoning.
94
  1. Pass **"enable_thinking=False"** when calling apply_chat_template.
95
  2. Adding **"/no_think"** before the prompt will force the model not to use perform CoT reasoning. Similarly, adding **"/think"** before the prompt will force the model to perform CoT reasoning.
96
 
 
113
  tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True,return_tensors="pt",
114
  enable_thinking=True # Toggle thinking mode (default: True)
115
  )
116
+
117
  outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
118
 
119
  output_text = tokenizer.decode(outputs[0])
 
274
  ### Int4 Quantization
275
  We use the GPTQ and AWQ algorithm to achieve W4A16 quantization.
276
 
277
+ GPTQ processes the model weights layer by layer, uses a small amount of calibration data to minimize the reconfiguration error of the quantized weights, and adjusts the weights layer by layer by the optimization process of approximating the Hessian inverse matrix. The process eliminates the need to retrain the model and requires only a small amount of calibration data to quantize the weights, improving inference efficiency and lowering the deployment threshold.
278
  AWQ using a small amount of calibration data (without the need for training), the amplitude of the activation values is statistically calculated. For each weight channel, a scaling coefficient s is computed to expand the numerical range of important weights, allowing more information to be retained during quantization.
279
 
280
  You can use [AngleSlim](https://github.com/tencent/AngelSlim) quantization, you can also directly download our quantization completed open source model to use [LINK](https://huggingface.co/).
 
296
 
297
  For deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.
298
 
299
+ image: https://hub.docker.com/r/hunyuaninfer/hunyuan-7B/tags
300
 
301
 
302
  ### TensorRT-LLM
303
 
304
+ #### Docker Image
305
 
306
  We provide a pre-built Docker image based on the latest version of TensorRT-LLM.
307
 
308
  We use tencent/Hunyuan-7B-Instruct for example
309
  - To get started:
310
 
311
+ https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
312
 
313
  ```
314
  docker pull hunyuaninfer/hunyuan-7B:hunyuan-moe-7B-trtllm
 
359
  Please use vLLM version v0.10.0 or higher for inference.
360
 
361
  We use tencent/Hunyuan-7B-Instruct for example
362
+ - Download Model file:
363
  - Huggingface: will download automicly by vllm.
364
  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-7B-Instruct`
365
+
366
  - model download by huggingface:
367
  ```shell
368
  export MODEL_PATH=tencent/Hunyuan-7B-Instruct
369
+ ```
370
 
371
  - model downloaded by modelscope:
372
  ```shell
 
386
  --quantization experts_int8 \
387
  --served-model-name hunyuan \
388
  2>&1 | tee log_server.txt
389
+ ```
390
  - After running service script successfully, run the request script
391
  ```shell
392
  curl http://0.0.0.0:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{
 
474
 
475
  ### SGLang
476
 
477
+ #### Docker Image
478
 
479
  We also provide a pre-built Docker image based on the latest version of SGLang.
480
 
 
504
 
505
  ## Contact Us
506
 
507
+ If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email ([email protected]).