"git@developer.sourcefind.cn:change/sglang.git" did not exist on "57eec0bfbce964e347ef2affb999e03416f22325"
Unverified Commit 9ffe1f1e authored by Daniël de Kok's avatar Daniël de Kok Committed by GitHub
Browse files

Do not initialize scratch space when there are no ExLlamaV2 layers (#2015)

# What does this PR do?

Do not attempt to allocate ExLlamaV2 scratch buffers when there are no
ExLlama2 layers. Avoids a crash in warmup for models that cannot use
exllama when ExLlamaV2 is installed.

## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
parent 824edf28
...@@ -145,6 +145,11 @@ def set_device(device): ...@@ -145,6 +145,11 @@ def set_device(device):
def create_exllama_buffers(max_total_tokens: int): def create_exllama_buffers(max_total_tokens: int):
global LAYERS, DEVICE global LAYERS, DEVICE
# No need to initialize scratch space if there are no layers
# that use ExLLamav2.
if len(LAYERS) == 0:
return
# Find the size of the scratch space. # Find the size of the scratch space.
scratch_bytes = max( scratch_bytes = max(
layer.scratch_space_fixed(max_input_len=max_total_tokens, max_batch_size=1) layer.scratch_space_fixed(max_input_len=max_total_tokens, max_batch_size=1)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment