Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
b205e846
Unverified
Commit
b205e846
authored
Jul 01, 2025
by
QiliangCui
Committed by
GitHub
Jul 02, 2025
Browse files
[Doc][TPU] Add models and features supporting matrix. (#20230)
Signed-off-by:
Qiliang Cui
<
cuiq@google.com
>
parent
be0cfb2b
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
54 additions
and
17 deletions
+54
-17
docs/.nav.yml
docs/.nav.yml
+1
-0
docs/features/compatibility_matrix.md
docs/features/compatibility_matrix.md
+17
-17
docs/models/hardware_supported_models/tpu.md
docs/models/hardware_supported_models/tpu.md
+36
-0
No files found.
docs/.nav.yml
View file @
b205e846
...
...
@@ -39,6 +39,7 @@ nav:
-
models/generative_models.md
-
models/pooling_models.md
-
models/extensions
-
Hardware Supported Models
:
models/hardware_supported_models
-
Features
:
-
features/compatibility_matrix.md
-
features/*
...
...
docs/features/compatibility_matrix.md
View file @
b205e846
...
...
@@ -59,23 +59,23 @@ th:not(:first-child) {
## Feature x Hardware
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD |
|-----------------------------------------------------------|--------------------
|
----------|----------
|
-------|----------|--------------------|-------|
|
[
CP
][
chunked-prefill
]
|
[
❌
](
gh-issue:2729
)
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
[
APC
][
automatic-prefix-caching
]
|
[
❌
](
gh-issue:3687
)
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
[
LoRA
][
lora-adapter
]
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
<abbr
title=
"Prompt Adapter"
>
prmpt adptr
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ |
[
❌
](
gh-issue:8475
)
| ✅ |
|
[
SD
][
spec-decode
]
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
|
<abbr
title=
"Pooling Models"
>
pooling
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ |
|
<abbr
title=
"Encoder-Decoder Models"
>
enc-dec
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
<abbr
title=
"Multimodal Inputs"
>
mm
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
<abbr
title=
"Logprobs"
>
logP
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
<abbr
title=
"Prompt Logprobs"
>
prmpt logP
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
<abbr
title=
"Async Output Processing"
>
async output
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ |
[
❌
](
gh-issue:8477
)
| ✅ |
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Feature | Volta
| Turing
| Ampere
| Ada
| Hopper
| CPU | AMD
|
TPU |
|-----------------------------------------------------------|--------------------
-|-
----------|----------
-|-
-------|----------
--
|--------------------|-------
-
|
-----|
|
[
CP
][
chunked-prefill
]
|
[
❌
](
gh-issue:2729
)
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
✅ |
|
[
APC
][
automatic-prefix-caching
]
|
[
❌
](
gh-issue:3687
)
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
✅ |
|
[
LoRA
][
lora-adapter
]
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
✅ |
|
<abbr
title=
"Prompt Adapter"
>
prmpt adptr
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ |
[
❌
](
gh-issue:8475
)
| ✅ |
❌ |
|
[
SD
][
spec-decode
]
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
❌ |
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
❌ |
|
<abbr
title=
"Pooling Models"
>
pooling
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ |
❌ |
|
<abbr
title=
"Encoder-Decoder Models"
>
enc-dec
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
❌ |
|
<abbr
title=
"Multimodal Inputs"
>
mm
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
❌ |
|
<abbr
title=
"Logprobs"
>
logP
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
❌ |
|
<abbr
title=
"Prompt Logprobs"
>
prmpt logP
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
❌ |
|
<abbr
title=
"Async Output Processing"
>
async output
</abbr>
| ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
❌ |
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ |
[
❌
](
gh-issue:8477
)
| ✅ |
❌ |
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
❌ |
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
❌ |
!!! note
Please refer to
[
Feature support through NxD Inference backend
][
feature-support-through-nxd-inference-backend
]
for features supported on AWS Neuron hardware
docs/models/hardware_supported_models/tpu.md
0 → 100644
View file @
b205e846
---
title
:
TPU
---
[](
){
#tpu-supported-models }
# TPU Supported Models
## Text-only Language Models
| Model | Architecture | Supported |
|-----------------------------------------------------|--------------------------------|-----------|
| mistralai/Mixtral-8x7B-Instruct-v0.1 | MixtralForCausalLM | 🟨 |
| mistralai/Mistral-Small-24B-Instruct-2501 | MistralForCausalLM | ✅ |
| mistralai/Codestral-22B-v0.1 | MistralForCausalLM | ✅ |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | MixtralForCausalLM | ❌ |
| meta-llama/Llama-3.3-70B-Instruct | LlamaForCausalLM | ✅ |
| meta-llama/Llama-3.1-8B-Instruct | LlamaForCausalLM | ✅ |
| meta-llama/Llama-3.1-70B-Instruct | LlamaForCausalLM | ✅ |
| meta-llama/Llama-4-
*
| Llama4ForConditionalGeneration | ❌ |
| microsoft/Phi-3-mini-128k-instruct | Phi3ForCausalLM | 🟨 |
| microsoft/phi-4 | Phi3ForCausalLM | ❌ |
| google/gemma-3-27b-it | Gemma3ForConditionalGeneration | 🟨 |
| google/gemma-3-4b-it | Gemma3ForConditionalGeneration | ❌ |
| deepseek-ai/DeepSeek-R1 | DeepseekV3ForCausalLM | ❌ |
| deepseek-ai/DeepSeek-V3 | DeepseekV3ForCausalLM | ❌ |
| RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 | LlamaForCausalLM | ✅ |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 | LlamaForCausalLM | ✅ |
| Qwen/Qwen3-8B | Qwen3ForCausalLM | ✅ |
| Qwen/Qwen3-32B | Qwen3ForCausalLM | ✅ |
| Qwen/Qwen2.5-7B-Instruct | Qwen2ForCausalLM | ✅ |
| Qwen/Qwen2.5-32B | Qwen2ForCausalLM | ✅ |
| Qwen/Qwen2.5-14B-Instruct | Qwen2ForCausalLM | ✅ |
| Qwen/Qwen2.5-1.5B-Instruct | Qwen2ForCausalLM | 🟨 |
✅ Runs and optimized.
🟨 Runs and correct but not optimized to green yet.
❌ Does not pass accuracy test or does not run.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment