seungrok81 commited on
Commit
cdaeda9
·
verified ·
1 Parent(s): 7beb97b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ ---
4
+
5
+ ## Introduction
6
+ This is vllm-compatible fp8 ptq model based on [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
7
+ For detailed quantization scheme, refer to the official documentation of [AMD Quark 0.2.0 quantizer](https://quark.docs.amd.com/latest/index.html).
8
+
9
+ ## Quickstart
10
+
11
+ To run this fp8 model on vLLM framework,
12
+
13
+ ### Modle Preparation
14
+ 1. build the rocm-vllm docker image by using this [dockerfile](https://github.com/ROCm/vllm/blob/main/Dockerfile.rocm) and launch a vllm docker container.
15
+
16
+ ```sh
17
+ docker build -f Dockerfile_amd -t vllm_test .
18
+ docker run --rm -it --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 16G vllm_test:latest
19
+ ```
20
+
21
+ 2. clone the baseline [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
22
+ 3. clone this [fp8 model](https://huggingface.co/amd/Meta-Llama-3.1-8B-Instruct-fp8-quark-vllm).
23
+ 4. move llama.safetensors and llama.json from [fp8 model](https://huggingface.co/amd/Meta-Llama-3.1-8B-Instruct-fp8-quark-vllm) to the saved directory of [Meta-Llama-3.1-8B-Instruct] by this command. Model snapshot commit# 8c22764a7e3675c50d4c7c9a4edb474456022b16 can be different.
24
+ ```sh
25
+ cp llama.json ~/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16/.
26
+ cp llama.safetensors ~/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16/.
27
+ ```
28
+
29
+ ### Running fp8 model
30
+
31
+ ```sh
32
+ # single GPU
33
+ python run_vllm_fp8.py
34
+
35
+ # 8 GPUs
36
+ torchrun --standalone --nproc_per_node=8 run_vllm_fp8.py
37
+ ```
38
+
39
+ ```python
40
+ # run_vllm_fp8.py
41
+ from vllm import LLM, SamplingParams
42
+ prompt = "Write me an essay about bear and knight"
43
+
44
+ model_name="/workspace/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16/"
45
+ tp=1 # single GPU
46
+ tp=8 # 8 GPUs
47
+
48
+ model = LLM(model=model_name, tensor_parallel_size=tp, max_model_len=8192, trust_remote_code=True, dtype="float16", quantization="fp8", quantized_weights_path="/llama.safetensors")
49
+ sampling_params = SamplingParams(
50
+ top_k=1.0,
51
+ ignore_eos=True,
52
+ max_tokens=200,
53
+ )
54
+ result = model.generate(prompt, sampling_params=sampling_params)
55
+ print(result)
56
+ ```
57
+ ### Running fp16 model (For comparison)
58
+
59
+ ```sh
60
+ # single GPU
61
+ python run_vllm_fp8.py
62
+
63
+ # 8 GPUs
64
+ torchrun --standalone --nproc_per_node=8 run_vllm_fp8.py
65
+ ```
66
+
67
+ ```python
68
+ # run_vllm_fp16.py
69
+ from vllm import LLM, SamplingParams
70
+ prompt = "Write me an essay about bear and knight"
71
+
72
+ model_name="/workspace/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16/"
73
+ tp=1 # single GPU
74
+ tp=8 # 8 GPUs
75
+ model = LLM(model=model_name, tensor_parallel_size=tp, max_model_len=8192, trust_remote_code=True, dtype="bfloat16")
76
+ sampling_params = SamplingParams(
77
+ top_k=1.0,
78
+ ignore_eos=True,
79
+ max_tokens=200,
80
+ )
81
+ result = model.generate(prompt, sampling_params=sampling_params)
82
+ print(result)
83
+ ```
84
+
85
+ #### License
86
+ Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
87
+
88
+ Licensed under the Apache License, Version 2.0 (the "License");
89
+ you may not use this file except in compliance with the License.
90
+ You may obtain a copy of the License at
91
+
92
+ http://www.apache.org/licenses/LICENSE-2.0
93
+
94
+ Unless required by applicable law or agreed to in writing, software
95
+ distributed under the License is distributed on an "AS IS" BASIS,
96
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
97
+ See the License for the specific language governing permissions and
98
+ limitations under the License.