GPTQ: fix README and support default names

b465cd01 · gk · b296c4f6 · b465cd01 · b465cd01
Commit b465cd01 authored May 25, 2023 by gk
Show whitespace changes
Inline Side-by-side

Showing with 10 additions and 9 deletions

README.md README.md +3 -3

lm_eval/models/huggingface.py lm_eval/models/huggingface.py +7 -6

No files found.
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@ This project provides a unified framework to test generative language models on
 Features:
 - 200+ tasks implemented. See the [task-table](./docs/task_table.md) for a complete list.
- Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (with [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface.
+- Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (including quantization via [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface.
 - Support for commercial APIs including [OpenAI](https://openai.com), [goose.ai](https://goose.ai), and [TextSynth](https://textsynth.com/).
 - Support for evaluation on adapters (e.g. LoRa) supported in [HuggingFace's PEFT library](https://github.com/huggingface/peft).
 - Evaluating with publicly available prompts ensures reproducibility and comparability between papers.
@@ -111,12 +111,12 @@ python main.py \
    --device cuda:0
 ```
-GPTQ models can be loaded by specifying their file names in `,quantized=NAME` in the `model_args` argument:
+GPTQ quantized models can be loaded by installing [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) with `pip install auto-gptq[triton]` and specifying their file names in `,quantized=NAME` (or `,quantized=True` for default names) in the `model_args` argument:
 ```bash
 python main.py \
    --model hf-causal-experimental \
-    --model_args pretrained=model-directory,quantized=model.safetensors \
+    --model_args pretrained=model-name-or-path,quantized=model.safetensors \
    --tasks hellaswag
 ```

--- a/lm_eval/models/huggingface.py
+++ b/lm_eval/models/huggingface.py
@@ -70,7 +70,7 @@ class HuggingFaceAutoLM(BaseLM):
    def __init__(
        self,
        pretrained: str,
-        quantized: Optional[str] = None,
+        quantized: Optional[Union[True, str]] = None,
        tokenizer: Optional[str] = None,
        subfolder: Optional[str] = None,
        revision: Optional[str] = "main",
@@ -95,8 +95,9 @@ class HuggingFaceAutoLM(BaseLM):
                The HuggingFace Hub model ID name or the path to a pre-trained
                model to load. This is effectively the `pretrained_model_name_or_path`
                argument of `from_pretrained` in the HuggingFace `transformers` API.
-            quantized (str, optional, defaults to None):
+            quantized (str or True, optional, defaults to None):
-                File name of a GPTQ model to load.
+                File name of a GPTQ quantized model to load. Set to `True` to use the
+                default name of the quantized model.
            add_special_tokens (bool, optional, defaults to True):
                Whether to add special tokens to the input sequences. If `None`, the
                default value will be set to `True` for seq2seq models (e.g. T5) and
@@ -229,7 +230,7 @@ class HuggingFaceAutoLM(BaseLM):
        self,
        *,
        pretrained: str,
-        quantized: Optional[str] = None,
+        quantized: Optional[Union[True, str]] = None,
        revision: str,
        subfolder: str,
        device_map: Optional[Union[str, _DeviceMapping]] = None,
@@ -255,11 +256,11 @@ class HuggingFaceAutoLM(BaseLM):
            from auto_gptq import AutoGPTQForCausalLM
            model = AutoGPTQForCausalLM.from_quantized(
                pretrained,
-                model_basename=Path(quantized).stem,
+                model_basename=None if quantized == True else Path(quantized).stem,
                device_map=device_map,
                max_memory=max_memory,
                trust_remote_code=trust_remote_code,
-                use_safetensors=quantized.endswith('.safetensors'),
+                use_safetensors=True if quantized == True else quantized.endswith('.safetensors'),
                use_triton=True,
            )
        return model