- 02 Feb, 2025 1 commit
-
-
Russell Bryant authored
I noticed during testing that I was getting a lot of these deprecation warnings about `local_lora_path`: ``` DeprecationWarning: The 'lora_local_path' attribute is deprecated and will be removed in a future version. Please use 'lora_path' instead. ``` The check used for emitting this warning was always True, even when the parameter was not actually specified. It will always be in `__struct_fields__`. We should be checking for a non-None value, instead. Signed-off-by:Russell Bryant <rbryant@redhat.com> Signed-off-by:
Russell Bryant <rbryant@redhat.com>
-
- 20 Sep, 2024 1 commit
-
-
Jiaxin Shan authored
-
- 06 Sep, 2024 1 commit
-
-
Jiaxin Shan authored
Co-authored-by:Jee Jee Li <pandaleefree@gmail.com>
-
- 19 Aug, 2024 1 commit
-
-
SangBin Cho authored
-
- 22 Jul, 2024 1 commit
-
-
Jiaxin Shan authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-