Commit b465cd01 authored by gk's avatar gk
Browse files

GPTQ: fix README and support default names

parent b296c4f6
...@@ -7,7 +7,7 @@ This project provides a unified framework to test generative language models on ...@@ -7,7 +7,7 @@ This project provides a unified framework to test generative language models on
Features: Features:
- 200+ tasks implemented. See the [task-table](./docs/task_table.md) for a complete list. - 200+ tasks implemented. See the [task-table](./docs/task_table.md) for a complete list.
- Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (with [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface. - Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (including quantization via [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface.
- Support for commercial APIs including [OpenAI](https://openai.com), [goose.ai](https://goose.ai), and [TextSynth](https://textsynth.com/). - Support for commercial APIs including [OpenAI](https://openai.com), [goose.ai](https://goose.ai), and [TextSynth](https://textsynth.com/).
- Support for evaluation on adapters (e.g. LoRa) supported in [HuggingFace's PEFT library](https://github.com/huggingface/peft). - Support for evaluation on adapters (e.g. LoRa) supported in [HuggingFace's PEFT library](https://github.com/huggingface/peft).
- Evaluating with publicly available prompts ensures reproducibility and comparability between papers. - Evaluating with publicly available prompts ensures reproducibility and comparability between papers.
...@@ -111,12 +111,12 @@ python main.py \ ...@@ -111,12 +111,12 @@ python main.py \
--device cuda:0 --device cuda:0
``` ```
GPTQ models can be loaded by specifying their file names in `,quantized=NAME` in the `model_args` argument: GPTQ quantized models can be loaded by installing [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) with `pip install auto-gptq[triton]` and specifying their file names in `,quantized=NAME` (or `,quantized=True` for default names) in the `model_args` argument:
```bash ```bash
python main.py \ python main.py \
--model hf-causal-experimental \ --model hf-causal-experimental \
--model_args pretrained=model-directory,quantized=model.safetensors \ --model_args pretrained=model-name-or-path,quantized=model.safetensors \
--tasks hellaswag --tasks hellaswag
``` ```
......
...@@ -70,7 +70,7 @@ class HuggingFaceAutoLM(BaseLM): ...@@ -70,7 +70,7 @@ class HuggingFaceAutoLM(BaseLM):
def __init__( def __init__(
self, self,
pretrained: str, pretrained: str,
quantized: Optional[str] = None, quantized: Optional[Union[True, str]] = None,
tokenizer: Optional[str] = None, tokenizer: Optional[str] = None,
subfolder: Optional[str] = None, subfolder: Optional[str] = None,
revision: Optional[str] = "main", revision: Optional[str] = "main",
...@@ -95,8 +95,9 @@ class HuggingFaceAutoLM(BaseLM): ...@@ -95,8 +95,9 @@ class HuggingFaceAutoLM(BaseLM):
The HuggingFace Hub model ID name or the path to a pre-trained The HuggingFace Hub model ID name or the path to a pre-trained
model to load. This is effectively the `pretrained_model_name_or_path` model to load. This is effectively the `pretrained_model_name_or_path`
argument of `from_pretrained` in the HuggingFace `transformers` API. argument of `from_pretrained` in the HuggingFace `transformers` API.
quantized (str, optional, defaults to None): quantized (str or True, optional, defaults to None):
File name of a GPTQ model to load. File name of a GPTQ quantized model to load. Set to `True` to use the
default name of the quantized model.
add_special_tokens (bool, optional, defaults to True): add_special_tokens (bool, optional, defaults to True):
Whether to add special tokens to the input sequences. If `None`, the Whether to add special tokens to the input sequences. If `None`, the
default value will be set to `True` for seq2seq models (e.g. T5) and default value will be set to `True` for seq2seq models (e.g. T5) and
...@@ -229,7 +230,7 @@ class HuggingFaceAutoLM(BaseLM): ...@@ -229,7 +230,7 @@ class HuggingFaceAutoLM(BaseLM):
self, self,
*, *,
pretrained: str, pretrained: str,
quantized: Optional[str] = None, quantized: Optional[Union[True, str]] = None,
revision: str, revision: str,
subfolder: str, subfolder: str,
device_map: Optional[Union[str, _DeviceMapping]] = None, device_map: Optional[Union[str, _DeviceMapping]] = None,
...@@ -255,11 +256,11 @@ class HuggingFaceAutoLM(BaseLM): ...@@ -255,11 +256,11 @@ class HuggingFaceAutoLM(BaseLM):
from auto_gptq import AutoGPTQForCausalLM from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized( model = AutoGPTQForCausalLM.from_quantized(
pretrained, pretrained,
model_basename=Path(quantized).stem, model_basename=None if quantized == True else Path(quantized).stem,
device_map=device_map, device_map=device_map,
max_memory=max_memory, max_memory=max_memory,
trust_remote_code=trust_remote_code, trust_remote_code=trust_remote_code,
use_safetensors=quantized.endswith('.safetensors'), use_safetensors=True if quantized == True else quantized.endswith('.safetensors'),
use_triton=True, use_triton=True,
) )
return model return model
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment