jerryzh168 commited on
Commit
d6542af
·
verified ·
1 Parent(s): ea29a3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -98,7 +98,11 @@ print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_
98
  ```
99
 
100
  # Serving with vllm
101
- We can use the same command we used in serving benchmarks to serve the model with vllm
 
 
 
 
102
  ```
103
  vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
104
  ```
 
98
  ```
99
 
100
  # Serving with vllm
101
+ Need to install vllm nightly to get some recent changes
102
+ ```
103
+ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
104
+ ```
105
+
106
  ```
107
  vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
108
  ```