Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
9765b5c4
Unverified
Commit
9765b5c4
authored
Mar 29, 2024
by
Hongxia Yang
Committed by
GitHub
Mar 29, 2024
Browse files
[ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic (#3699)
parent
430530fc
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
5 additions
and
5 deletions
+5
-5
Dockerfile.rocm
Dockerfile.rocm
+1
-1
requirements-rocm.txt
requirements-rocm.txt
+1
-1
vllm/attention/backends/xformers.py
vllm/attention/backends/xformers.py
+2
-2
vllm/model_executor/parallel_utils/pynccl.py
vllm/model_executor/parallel_utils/pynccl.py
+1
-1
No files found.
Dockerfile.rocm
View file @
9765b5c4
...
@@ -90,6 +90,6 @@ RUN cd /app \
...
@@ -90,6 +90,6 @@ RUN cd /app \
&& cd ..
&& cd ..
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --no-cache-dir ray[all]
RUN python3 -m pip install --no-cache-dir ray[all]
==2.9.3
CMD ["/bin/bash"]
CMD ["/bin/bash"]
requirements-rocm.txt
View file @
9765b5c4
...
@@ -5,7 +5,7 @@ starlette
...
@@ -5,7 +5,7 @@ starlette
requests
requests
py-cpuinfo
py-cpuinfo
psutil
psutil
ray
>
= 2.9
ray
=
= 2.9
.3
sentencepiece # Required for LLaMA tokenizer.
sentencepiece # Required for LLaMA tokenizer.
numpy
numpy
tokenizers>=0.15.0
tokenizers>=0.15.0
...
...
vllm/attention/backends/xformers.py
View file @
9765b5c4
...
@@ -405,8 +405,8 @@ def _check_use_naive_attention() -> bool:
...
@@ -405,8 +405,8 @@ def _check_use_naive_attention() -> bool:
if
not
is_hip
():
if
not
is_hip
():
return
False
return
False
# For ROCm, check whether flash attention is installed or not.
# For ROCm, check whether flash attention is installed or not.
has_flash_att
n
=
importlib
.
util
.
find_spec
(
"flash_attn"
)
is
None
use_naive_attentio
n
=
importlib
.
util
.
find_spec
(
"flash_attn"
)
is
None
if
not
has_flash_att
n
:
if
use_naive_attentio
n
:
logger
.
warning
(
"flash_attn is not installed. Using naive attention. "
logger
.
warning
(
"flash_attn is not installed. Using naive attention. "
"This will take significantly more GPU memory."
)
"This will take significantly more GPU memory."
)
return
True
return
True
...
...
vllm/model_executor/parallel_utils/pynccl.py
View file @
9765b5c4
...
@@ -41,7 +41,7 @@ else:
...
@@ -41,7 +41,7 @@ else:
if
torch
.
version
.
cuda
is
not
None
:
if
torch
.
version
.
cuda
is
not
None
:
so_file
=
"libnccl.so.2"
so_file
=
"libnccl.so.2"
elif
torch
.
version
.
hip
is
not
None
:
elif
torch
.
version
.
hip
is
not
None
:
so_file
=
"librccl.so.
2
"
so_file
=
"librccl.so.
1
"
else
:
else
:
raise
ValueError
(
"NCCL only supports CUDA and ROCm backends."
)
raise
ValueError
(
"NCCL only supports CUDA and ROCm backends."
)
logger
.
debug
(
f
"Loading nccl from library
{
so_file
}
"
)
logger
.
debug
(
f
"Loading nccl from library
{
so_file
}
"
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment