Unverified Commit 935a77fb authored by Maxime Laboissonnière's avatar Maxime Laboissonnière Committed by GitHub
Browse files

Fix exllama wronfully loading (#990)

# What does this PR do?
The
[changes](https://github.com/huggingface/text-generation-inference/pull/986/files#diff-b72e45030214e50c8ff6e3be837057b3f3368b9779fd942ca680f949fe069eafR176)
disabling exllama on old compute had unintended consequences of not
setting `use_exllama` to `False` if `HAS_EXLLAMA` equals `False` **and**
`CAN_EXLLAMA` equals `False`. This fixes this.

## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?
@OlivierDehaene @Narsil 

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
parent a9fdfb24
...@@ -173,10 +173,11 @@ class Weights: ...@@ -173,10 +173,11 @@ class Weights:
from text_generation_server.utils.layers import HAS_EXLLAMA, CAN_EXLLAMA from text_generation_server.utils.layers import HAS_EXLLAMA, CAN_EXLLAMA
if use_exllama: if use_exllama:
if not HAS_EXLLAMA and CAN_EXLLAMA: if not HAS_EXLLAMA:
logger.warning( if CAN_EXLLAMA:
"Exllama GPTQ cuda kernels (which are faster) could have been used, but are not currently installed, try using BUILD_EXTENSIONS=True" logger.warning(
) "Exllama GPTQ cuda kernels (which are faster) could have been used, but are not currently installed, try using BUILD_EXTENSIONS=True"
)
use_exllama = False use_exllama = False
else: else:
logger.info("Using exllama kernels") logger.info("Using exllama kernels")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment