deploying finetuned model on triton using vllm backend

#157

by Prabhjot410 - opened 6 days ago

6 days ago

I am following this article : https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/vllm_backend/docs/llama_multi_lora_tutorial.html

and getting error :
E1013 12:43:04.029817 1 model_lifecycle.cc:654] "failed to load 'oss-20b' version 1: Internal: ValueError: 'aimv2' is already used by a Transformers config, pick another name.
then i update the version : pip install -U transformers>=4.55.0 kernels torch==2.6.0 in dockerfile
got this error E1013 12:51:12.611620 1 model_lifecycle.cc:654] "failed to load 'oss-20b' version 1: Internal: ModuleNotFoundError: Could not import module 'ProcessorMixin'. Are this object's requirements defined correctly?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment