Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
7c8566aa
Unverified
Commit
7c8566aa
authored
Sep 20, 2024
by
omrishiv
Committed by
GitHub
Sep 20, 2024
Browse files
[Doc] neuron documentation update (#8671)
Signed-off-by:
omrishiv
<
327609+omrishiv@users.noreply.github.com
>
parent
b4e4eda9
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
3 deletions
+3
-3
docs/source/getting_started/neuron-installation.rst
docs/source/getting_started/neuron-installation.rst
+2
-2
docs/source/index.rst
docs/source/index.rst
+1
-1
No files found.
docs/source/getting_started/neuron-installation.rst
View file @
7c8566aa
...
@@ -3,8 +3,8 @@
...
@@ -3,8 +3,8 @@
Installation with Neuron
Installation with Neuron
========================
========================
vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK.
vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK
with continuous batching
.
At the moment
Paged Attention
is not supported in Neuron SDK, but naive continuous batching is supported in transformers-neur
on
x
.
Paged Attention
and Chunked Prefill are currently in development and will be available so
on.
Data types currently supported in Neuron SDK are FP16 and BF16.
Data types currently supported in Neuron SDK are FP16 and BF16.
Requirements
Requirements
...
...
docs/source/index.rst
View file @
7c8566aa
...
@@ -43,7 +43,7 @@ vLLM is flexible and easy to use with:
...
@@ -43,7 +43,7 @@ vLLM is flexible and easy to use with:
* Tensor parallelism and pipeline parallelism support for distributed inference
* Tensor parallelism and pipeline parallelism support for distributed inference
* Streaming outputs
* Streaming outputs
* OpenAI-compatible API server
* OpenAI-compatible API server
* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS
Neuron
.
* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS
Trainium and Inferentia Accelerators
.
* Prefix caching support
* Prefix caching support
* Multi-lora support
* Multi-lora support
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment