Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
SIYIXNI
vllm
Commits
1af090b5
"...composable_kernel_rocm.git" did not exist on "2b52fbd24a63e6d43081b4db0913b6e9cca8e400"
Unverified
Commit
1af090b5
authored
Jan 31, 2024
by
Zhuohan Li
Committed by
GitHub
Jan 31, 2024
Browse files
Bump up version to v0.3.0 (#2656)
parent
3dad9444
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
7 additions
and
3 deletions
+7
-3
README.md
README.md
+3
-1
docs/source/index.rst
docs/source/index.rst
+3
-1
vllm/__init__.py
vllm/__init__.py
+1
-1
No files found.
README.md
View file @
1af090b5
...
@@ -46,7 +46,7 @@ vLLM is fast with:
...
@@ -46,7 +46,7 @@ vLLM is fast with:
-
Efficient management of attention key and value memory with
**PagedAttention**
-
Efficient management of attention key and value memory with
**PagedAttention**
-
Continuous batching of incoming requests
-
Continuous batching of incoming requests
-
Fast model execution with CUDA/HIP graph
-
Fast model execution with CUDA/HIP graph
-
Quantization:
[
GPTQ
](
https://arxiv.org/abs/2210.17323
)
,
[
AWQ
](
https://arxiv.org/abs/2306.00978
)
,
[
SqueezeLLM
](
https://arxiv.org/abs/2306.07629
)
-
Quantization:
[
GPTQ
](
https://arxiv.org/abs/2210.17323
)
,
[
AWQ
](
https://arxiv.org/abs/2306.00978
)
,
[
SqueezeLLM
](
https://arxiv.org/abs/2306.07629
)
, FP8 KV Cache
-
Optimized CUDA kernels
-
Optimized CUDA kernels
vLLM is flexible and easy to use with:
vLLM is flexible and easy to use with:
...
@@ -57,6 +57,8 @@ vLLM is flexible and easy to use with:
...
@@ -57,6 +57,8 @@ vLLM is flexible and easy to use with:
-
Streaming outputs
-
Streaming outputs
-
OpenAI-compatible API server
-
OpenAI-compatible API server
-
Support NVIDIA GPUs and AMD GPUs
-
Support NVIDIA GPUs and AMD GPUs
-
(Experimental) Prefix caching support
-
(Experimental) Multi-lora support
vLLM seamlessly supports many Hugging Face models, including the following architectures:
vLLM seamlessly supports many Hugging Face models, including the following architectures:
...
...
docs/source/index.rst
View file @
1af090b5
...
@@ -31,7 +31,7 @@ vLLM is fast with:
...
@@ -31,7 +31,7 @@ vLLM is fast with:
* Efficient management of attention key and value memory with **PagedAttention**
* Efficient management of attention key and value memory with **PagedAttention**
* Continuous batching of incoming requests
* Continuous batching of incoming requests
* Fast model execution with CUDA/HIP graph
* Fast model execution with CUDA/HIP graph
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
, FP8 KV Cache
* Optimized CUDA kernels
* Optimized CUDA kernels
vLLM is flexible and easy to use with:
vLLM is flexible and easy to use with:
...
@@ -42,6 +42,8 @@ vLLM is flexible and easy to use with:
...
@@ -42,6 +42,8 @@ vLLM is flexible and easy to use with:
* Streaming outputs
* Streaming outputs
* OpenAI-compatible API server
* OpenAI-compatible API server
* Support NVIDIA GPUs and AMD GPUs
* Support NVIDIA GPUs and AMD GPUs
* (Experimental) Prefix caching support
* (Experimental) Multi-lora support
For more information, check out the following:
For more information, check out the following:
...
...
vllm/__init__.py
View file @
1af090b5
...
@@ -8,7 +8,7 @@ from vllm.entrypoints.llm import LLM
...
@@ -8,7 +8,7 @@ from vllm.entrypoints.llm import LLM
from
vllm.outputs
import
CompletionOutput
,
RequestOutput
from
vllm.outputs
import
CompletionOutput
,
RequestOutput
from
vllm.sampling_params
import
SamplingParams
from
vllm.sampling_params
import
SamplingParams
__version__
=
"0.
2.7
"
__version__
=
"0.
3.0
"
__all__
=
[
__all__
=
[
"LLM"
,
"LLM"
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment