Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
5a9c236d
Commit
5a9c236d
authored
Sep 27, 2024
by
zhuwenwen
Browse files
update version and deps
parent
539aa992
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
7 additions
and
5 deletions
+7
-5
README.md
README.md
+1
-1
requirements-rocm.txt
requirements-rocm.txt
+1
-0
setup.py
setup.py
+1
-1
vllm/attention/backends/rocm_flash_attn.py
vllm/attention/backends/rocm_flash_attn.py
+4
-3
No files found.
README.md
View file @
5a9c236d
...
@@ -83,7 +83,7 @@ VLLM_INSTALL_PUNICA_KERNELS=1 python3 setup.py install
...
@@ -83,7 +83,7 @@ VLLM_INSTALL_PUNICA_KERNELS=1 python3 setup.py install
+
若使用 pip install 下载安装过慢,可添加源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
+
若使用 pip install 下载安装过慢,可添加源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
## 验证
## 验证
-
python -c "import vllm; print(vllm.
\_\_
version__)",版本号与官方版本同步,查询该软件的版本号,例如0.6.
1.post
2;
-
python -c "import vllm; print(vllm.
\_\_
version__)",版本号与官方版本同步,查询该软件的版本号,例如0.6.2;
## Known Issue
## Known Issue
-
无
-
无
...
...
requirements-rocm.txt
View file @
5a9c236d
...
@@ -9,6 +9,7 @@ ray >= 2.10.0
...
@@ -9,6 +9,7 @@ ray >= 2.10.0
peft
peft
pytest-asyncio
pytest-asyncio
tensorizer>=2.9.0
tensorizer>=2.9.0
setuptools_scm
torch == 2.3.0
torch == 2.3.0
triton == 2.1.0
triton == 2.1.0
...
...
setup.py
View file @
5a9c236d
...
@@ -399,7 +399,7 @@ def get_version_add(sha: Optional[str] = None) -> str:
...
@@ -399,7 +399,7 @@ def get_version_add(sha: Optional[str] = None) -> str:
try:
try:
__version__ = "0.6.2"
__version__ = "0.6.2"
__version_tuple__ = (0, 6, 2)
__version_tuple__ = (0, 6, 2)
__dcu_version__ = f'0.6.2+
{
version
}
__dcu_version__ = f'0.6.2+
{
version
}
'
from vllm.version import __version__, __version_tuple__, __dcu_version__
from vllm.version import __version__, __version_tuple__, __dcu_version__
except Exception as e:
except Exception as e:
...
...
vllm/attention/backends/rocm_flash_attn.py
View file @
5a9c236d
...
@@ -572,9 +572,10 @@ class ROCmFlashAttentionImpl(AttentionImpl):
...
@@ -572,9 +572,10 @@ class ROCmFlashAttentionImpl(AttentionImpl):
num_seqs
,
num_heads
,
head_size
=
decode_query
.
shape
num_seqs
,
num_heads
,
head_size
=
decode_query
.
shape
block_size
=
value_cache
.
shape
[
3
]
block_size
=
value_cache
.
shape
[
3
]
gqa_ratio
=
num_heads
//
self
.
num_kv_heads
gqa_ratio
=
num_heads
//
self
.
num_kv_heads
use_custom
=
_use_rocm_custom_paged_attention
(
# use_custom = _use_rocm_custom_paged_attention(
decode_query
.
dtype
,
head_size
,
block_size
,
gqa_ratio
,
# decode_query.dtype, head_size, block_size, gqa_ratio,
decode_meta
.
max_decode_seq_len
)
# decode_meta.max_decode_seq_len)
use_custom
=
False
if
use_custom
:
if
use_custom
:
max_seq_len
=
decode_meta
.
max_decode_seq_len
max_seq_len
=
decode_meta
.
max_decode_seq_len
max_num_partitions
=
(
max_num_partitions
=
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment