Unverified Commit ff41a856 authored by Ankush's avatar Ankush Committed by GitHub
Browse files

Bugfix/hf tokenizer gguf override (#3098)

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
parent 8c1016cb
......@@ -110,6 +110,28 @@ lm_eval --model hf \
> [!Note]
> Just like you can provide a local path to `transformers.AutoModel`, you can also provide a local path to `lm_eval` via `--model_args pretrained=/path/to/model`
#### Evaluating GGUF Models
`lm-eval` supports evaluating models in GGUF format using the Hugging Face (`hf`) backend. This allows you to use quantized models compatible with `transformers`, `AutoModel`, and llama.cpp conversions.
To evaluate a GGUF model, pass the path to the directory containing the model weights, the `gguf_file`, and optionally a separate `tokenizer` path using the `--model_args` flag.
**🚨 Important Note:**
If no separate tokenizer is provided, Hugging Face will attempt to reconstruct the tokenizer from the GGUF file — this can take **hours** or even hang indefinitely. Passing a separate tokenizer avoids this issue and can reduce tokenizer loading time from hours to seconds.
**✅ Recommended usage:**
```bash
lm_eval --model hf \
--model_args pretrained=/path/to/gguf_folder,gguf_file=model-name.gguf,tokenizer=/path/to/tokenizer \
--tasks hellaswag \
--device cuda:0 \
--batch_size 8
```
> [!Tip]
> Ensure the tokenizer path points to a valid Hugging Face tokenizer directory (e.g., containing tokenizer_config.json, vocab.json, etc.).
#### Multi-GPU Evaluation with Hugging Face `accelerate`
We support three main ways of using Hugging Face's [accelerate 🚀](https://github.com/huggingface/accelerate) library for multi-GPU evaluation.
......
......@@ -727,7 +727,7 @@ class HFLM(TemplateLM):
}
# gguf format embeds tokenizer and is not compatible with hf tokenizer `use_fast` param
if gguf_file is not None:
if not tokenizer and gguf_file is not None:
kwargs["gguf_file"] = gguf_file
else:
kwargs["use_fast"] = use_fast_tokenizer
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment