Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
1af090b5
Unverified
Commit
1af090b5
authored
Jan 31, 2024
by
Zhuohan Li
Committed by
GitHub
Jan 31, 2024
Browse files
Bump up version to v0.3.0 (#2656)
parent
3dad9444
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
7 additions
and
3 deletions
+7
-3
README.md
README.md
+3
-1
docs/source/index.rst
docs/source/index.rst
+3
-1
vllm/__init__.py
vllm/__init__.py
+1
-1
No files found.
README.md
View file @
1af090b5
...
@@ -46,7 +46,7 @@ vLLM is fast with:
...
@@ -46,7 +46,7 @@ vLLM is fast with:
-
Efficient management of attention key and value memory with
**PagedAttention**
-
Efficient management of attention key and value memory with
**PagedAttention**
-
Continuous batching of incoming requests
-
Continuous batching of incoming requests
-
Fast model execution with CUDA/HIP graph
-
Fast model execution with CUDA/HIP graph
-
Quantization:
[
GPTQ
](
https://arxiv.org/abs/2210.17323
)
,
[
AWQ
](
https://arxiv.org/abs/2306.00978
)
,
[
SqueezeLLM
](
https://arxiv.org/abs/2306.07629
)
-
Quantization:
[
GPTQ
](
https://arxiv.org/abs/2210.17323
)
,
[
AWQ
](
https://arxiv.org/abs/2306.00978
)
,
[
SqueezeLLM
](
https://arxiv.org/abs/2306.07629
)
, FP8 KV Cache
-
Optimized CUDA kernels
-
Optimized CUDA kernels
vLLM is flexible and easy to use with:
vLLM is flexible and easy to use with:
...
@@ -57,6 +57,8 @@ vLLM is flexible and easy to use with:
...
@@ -57,6 +57,8 @@ vLLM is flexible and easy to use with:
-
Streaming outputs
-
Streaming outputs
-
OpenAI-compatible API server
-
OpenAI-compatible API server
-
Support NVIDIA GPUs and AMD GPUs
-
Support NVIDIA GPUs and AMD GPUs
-
(Experimental) Prefix caching support
-
(Experimental) Multi-lora support
vLLM seamlessly supports many Hugging Face models, including the following architectures:
vLLM seamlessly supports many Hugging Face models, including the following architectures:
...
...
docs/source/index.rst
View file @
1af090b5
...
@@ -31,7 +31,7 @@ vLLM is fast with:
...
@@ -31,7 +31,7 @@ vLLM is fast with:
* Efficient management of attention key and value memory with **PagedAttention**
* Efficient management of attention key and value memory with **PagedAttention**
* Continuous batching of incoming requests
* Continuous batching of incoming requests
* Fast model execution with CUDA/HIP graph
* Fast model execution with CUDA/HIP graph
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
, FP8 KV Cache
* Optimized CUDA kernels
* Optimized CUDA kernels
vLLM is flexible and easy to use with:
vLLM is flexible and easy to use with:
...
@@ -42,6 +42,8 @@ vLLM is flexible and easy to use with:
...
@@ -42,6 +42,8 @@ vLLM is flexible and easy to use with:
* Streaming outputs
* Streaming outputs
* OpenAI-compatible API server
* OpenAI-compatible API server
* Support NVIDIA GPUs and AMD GPUs
* Support NVIDIA GPUs and AMD GPUs
* (Experimental) Prefix caching support
* (Experimental) Multi-lora support
For more information, check out the following:
For more information, check out the following:
...
...
vllm/__init__.py
View file @
1af090b5
...
@@ -8,7 +8,7 @@ from vllm.entrypoints.llm import LLM
...
@@ -8,7 +8,7 @@ from vllm.entrypoints.llm import LLM
from
vllm.outputs
import
CompletionOutput
,
RequestOutput
from
vllm.outputs
import
CompletionOutput
,
RequestOutput
from
vllm.sampling_params
import
SamplingParams
from
vllm.sampling_params
import
SamplingParams
__version__
=
"0.
2.7
"
__version__
=
"0.
3.0
"
__all__
=
[
__all__
=
[
"LLM"
,
"LLM"
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment