Unverified Commit d35008f1 authored by Lucia Quirke's avatar Lucia Quirke Committed by GitHub
Browse files

Enable steering HF models (#2749)



* Enable steering HF models
Co-authored-by: default avatarMatthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

* increase HF download timeout

* Update readme; improve steering vector device handling

* Update latest news

* remove HF timeout increase

* fix tests

* ignore sae lens test

* fix accidental force push

---------
Co-authored-by: default avatarMatthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>
parent 14b0bd26
......@@ -61,7 +61,7 @@ jobs:
# pip install bleurt@https://github.com/google-research/bleurt/archive/b610120347ef22b494b6d69b4316e303f5932516.zip#egg=bleurt
# if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Test with pytest
run: python -m pytest --showlocals -s -vv -n=auto --ignore=tests/models/test_neuralmagic.py --ignore=tests/models/test_openvino.py
run: python -m pytest --showlocals -s -vv -n=auto --ignore=tests/models/test_neuralmagic.py --ignore=tests/models/test_openvino.py --ignore=tests/models/test_hf_steered.py
- name: Archive artifacts
uses: actions/upload-artifact@v4
with:
......
......@@ -5,6 +5,7 @@
---
*Latest News 📣*
- [2025/03] Added support for steering HF models!
- [2025/02] Added [SGLang](https://docs.sglang.ai/) support!
- [2024/09] We are prototyping allowing users of LM Evaluation Harness to create and evaluate on text+image multimodal input, text output tasks, and have just added the `hf-multimodal` and `vllm-vlm` model types and `mmmu` task as a prototype feature. We welcome users to try out this in-progress feature and stress-test it for themselves, and suggest they check out [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval), a wonderful project originally forking off of the lm-evaluation-harness, for a broader range of multimodal tasks, models, and features.
- [2024/07] [API model](docs/API_guide.md) support has been updated and refactored, introducing support for batched and async requests, and making it significantly easier to customize and use for your own purposes. **To run Llama 405B, we recommend using VLLM's OpenAI-compliant API to host the model, and use the `local-completions` model type to evaluate the model.**
......@@ -157,6 +158,50 @@ To learn more about model parallelism and how to use it with the `accelerate` li
**Note: we do not currently support multi-node evaluations natively, and advise using either an externally hosted server to run inference requests against, or creating a custom integration with your distributed framework [as is done for the GPT-NeoX library](https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py).**
### Steered Hugging Face `transformers` models
To evaluate a Hugging Face `transformers` model with steering vectors applied, specify the model type as `steered` and provide the path to either a PyTorch file containing pre-defined steering vectors, or a CSV file that specifies how to derive steering vectors from pretrained `sparsify` or `sae_lens` models (you will need to install the corresponding optional dependency for this method).
Specify pre-defined steering vectors:
```python
import torch
steer_config = {
"layers.3": {
"steering_vector": torch.randn(1, 768),
"bias": torch.randn(1, 768),
"steering_coefficient": 1,
"action": "add"
},
}
torch.save(steer_config, "steer_config.pt")
```
Specify derived steering vectors:
```python
import pandas as pd
pd.DataFrame({
"loader": ["sparsify"],
"action": ["add"],
"sparse_model": ["EleutherAI/sae-pythia-70m-32k"],
"hookpoint": ["layers.3"],
"feature_index": [30],
"steering_coefficient": [10.0],
}).to_csv("steer_config.csv", index=False)
```
Run the evaluation harness with steering vectors applied:
```bash
lm_eval --model steered \
--model_args pretrained=EleutherAI/pythia-160m,steer_path=steer_config.pt \
--tasks lambada_openai,hellaswag \
--device cuda:0 \
--batch_size 8
```
### NVIDIA `nemo` models
[NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo) is a generative AI framework built for researchers and pytorch developers working on language models.
......@@ -523,8 +568,10 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
| multilingual | For multilingual tokenizers |
| optimum | For running Intel OpenVINO models |
| promptsource | For using PromptSource prompts |
| sae_lens | For using SAELens to steer models |
| sentencepiece | For using the sentencepiece tokenizer |
| sparseml | For using NM's SparseML models |
| sparsify | For using Sparsify to steer models |
| testing | For running library test suite |
| vllm | For loading models with vLLM |
| zeno | For visualizing results with Zeno |
......
......@@ -3,6 +3,7 @@ from . import (
api_models,
dummy,
gguf,
hf_steered,
hf_vlms,
huggingface,
ibm_watsonx_ai,
......
from contextlib import contextmanager
from functools import partial
from pathlib import Path
from typing import Any, Callable, Generator, Optional, Union
import torch
from peft.peft_model import PeftModel
from torch import Tensor, nn
from transformers import PreTrainedModel
from lm_eval.api.registry import register_model
from lm_eval.models.huggingface import HFLM
@contextmanager
def steer(
model: Union[PreTrainedModel, PeftModel], hook_to_steer: dict[str, Callable]
) -> Generator[None, Any, None]:
"""
Context manager that temporarily hooks models and steers them.
Args:
model: The transformer model to hook
hook_to_steer: Dictionary mapping hookpoints to steering functions
Yields:
None
"""
def create_hook(hookpoint: str):
def hook_fn(module: nn.Module, input: Any, output: Tensor):
# If output is a tuple (like in some transformer layers), take first element
if isinstance(output, tuple):
output = (hook_to_steer[hookpoint](output[0]), *output[1:]) # type: ignore
else:
output = hook_to_steer[hookpoint](output)
return output
return hook_fn
handles = []
hookpoints = list(hook_to_steer.keys())
for name, module in model.base_model.named_modules():
if name in hookpoints:
handle = module.register_forward_hook(create_hook(name))
handles.append(handle)
if len(handles) != len(hookpoints):
raise ValueError(f"Not all hookpoints could be resolved: {hookpoints}")
try:
yield None
finally:
for handle in handles:
handle.remove()
@register_model("steered")
class SteeredModel(HFLM):
hook_to_steer: dict[str, Callable]
def __init__(
self,
pretrained: str,
steer_path: str,
device: Optional[str] = None,
**kwargs,
):
"""
HFLM with a steered forward pass.
To derive steering vectors from a sparse model loadable with sparsify or sae_lens,
provide the path to a CSV file with the following columns (example rows are provided below):
loader,action,sparse_model,hookpoint,feature_index,steering_coefficient,sae_id,description,
sparsify,add,EleutherAI/sae-pythia-70m-32k,layers.3,30,10.0,,,
sae_lens,add,gemma-scope-2b-pt-res-canonical,layers.20,12082,240.0,layer_20/width_16k/canonical,increase dogs,
To load steering vectors directly, provide the path to a pytorch (.pt) file with content in the following format:
{
hookpoint: {
"steering_vector": <torch.Tensor>,
"steering_coefficient": <float>,
"action": <Literal["add", "clamp"]>,
"bias": <torch.Tensor | None>,
},
...
}
"""
super().__init__(pretrained=pretrained, device=device, **kwargs)
if steer_path.endswith(".pt") or steer_path.endswith(".pth"):
with open(steer_path, "rb") as f:
steer_config: dict[str, dict[str, Any]] = torch.load(
f, weights_only=True
)
elif steer_path.endswith(".csv"):
steer_config = self.derive_steer_config(steer_path)
else:
raise ValueError(f"Unknown steer file type: {steer_path}")
hook_to_steer = {}
for hookpoint, steer_info in steer_config.items():
action = steer_info["action"]
steering_coefficient = steer_info["steering_coefficient"]
steering_vector = (
steer_info["steering_vector"].to(self.device).to(self.model.dtype)
)
bias = (
steer_info["bias"].to(self.device).to(self.model.dtype)
if steer_info["bias"] is not None
else None
)
if action == "add":
# Steers the model by adding some multiple of a steering vector to all sequence positions.
hook_to_steer[hookpoint] = (
lambda acts: acts + steering_coefficient * steering_vector
)
elif action == "clamp":
hook_to_steer[hookpoint] = partial(
self.clamp,
steering_vector=steering_vector,
value=steering_coefficient,
bias=bias,
)
else:
raise ValueError(f"Unknown hook type: {action}")
self.hook_to_steer = hook_to_steer
@classmethod
def derive_steer_config(cls, steer_path: str):
"""Derive a dictionary of steering vectors from sparse model(/s) specified in a CSV file."""
import pandas as pd
df = pd.read_csv(steer_path)
steer_data: dict[str, dict[str, Any]] = {}
if any(df["loader"] == "sparsify"):
from sparsify import SparseCoder
if any(df["loader"] == "sae_lens"):
from sae_lens import SAE
sae_cache = {}
def load_from_sae_lens(sae_release: str, sae_id: str):
cache_key = (sae_release, sae_id)
if cache_key not in sae_cache:
sae_cache[cache_key] = SAE.from_pretrained(sae_release, sae_id)[0]
return sae_cache[cache_key]
for _, row in df.iterrows():
action = row.get("action", "add")
sparse_name = row["sparse_model"]
hookpoint = row["hookpoint"]
feature_index = int(row["feature_index"])
steering_coefficient = float(row["steering_coefficient"])
loader = row.get("loader", "sparsify")
if loader == "sparsify":
name_path = Path(sparse_name)
sparse_coder = (
SparseCoder.load_from_disk(name_path / hookpoint)
if name_path.exists()
else SparseCoder.load_from_hub(sparse_name, hookpoint)
)
assert sparse_coder.W_dec is not None
steering_vector = sparse_coder.W_dec[feature_index]
bias = sparse_coder.b_dec
elif loader == "sae_lens":
sparse_coder = load_from_sae_lens(
sae_release=sparse_name, sae_id=row["sae_id"]
)
steering_vector = sparse_coder.W_dec[feature_index]
bias = sparse_coder.b_dec
if hookpoint == "" or pd.isna(hookpoint):
hookpoint = sparse_coder.cfg.hook_name
else:
raise ValueError(f"Unknown loader: {loader}")
steer_data[hookpoint] = {
"action": action,
"steering_coefficient": steering_coefficient,
"steering_vector": steering_vector,
"bias": bias,
}
return steer_data
@classmethod
def clamp(
cls,
acts: Tensor,
steering_vector: Tensor,
value: float,
bias: Optional[Tensor] = None,
):
"""Clamps a direction of the activations to be the steering vector * the value.
Args:
acts (Tensor): The activations tensor to edit of shape [batch, pos, features]
steering_vector (Tensor): A direction to clamp of shape [features]
value (float): Value to clamp the direction to
bias (Tensor | None): Optional bias to add to the activations
Returns:
Tensor: The modified activations with the specified direction clamped
"""
if bias is not None:
acts = acts - bias
direction = steering_vector / torch.norm(steering_vector)
proj_magnitude = torch.sum(acts * direction, dim=-1, keepdim=True)
orthogonal_component = acts - proj_magnitude * direction
clamped = orthogonal_component + direction * value
if bias is not None:
return clamped + bias
return clamped
def forward(self, *args, **kwargs):
with torch.no_grad():
with steer(self.model, self.hook_to_steer):
return self.model.forward(*args, **kwargs)
def _model_call(self, *args, **kwargs):
with steer(self.model, self.hook_to_steer):
return super()._model_call(*args, **kwargs)
def _model_generate(self, *args, **kwargs):
with steer(self.model, self.hook_to_steer):
return super()._model_generate(*args, **kwargs)
......@@ -70,7 +70,9 @@ math = ["sympy>=1.12", "antlr4-python3-runtime==4.11", "math_verify[antlr4_11_0]
multilingual = ["nagisa>=0.2.7", "jieba>=0.42.1", "pycountry"]
optimum = ["optimum[openvino]"]
promptsource = ["promptsource>=0.2.3"]
sae_lens = ["sae_lens"]
sentencepiece = ["sentencepiece>=0.1.98"]
sparsify = ["sparsify"]
sparseml = ["sparseml-nightly[llm]>=1.8.0.20240404"]
testing = ["pytest", "pytest-cov", "pytest-xdist"]
vllm = ["vllm>=0.4.2"]
......@@ -91,7 +93,9 @@ all = [
"lm_eval[multilingual]",
"lm_eval[openai]",
"lm_eval[promptsource]",
"lm_eval[sae_lens]",
"lm_eval[sentencepiece]",
"lm_eval[sparsify]",
"lm_eval[sparseml]",
"lm_eval[testing]",
"lm_eval[vllm]",
......
# ruff: noqa
from __future__ import annotations
import os
import sys
from pathlib import Path
import numpy as np
import pytest
import torch
from lm_eval import tasks
from lm_eval.api.instance import Instance
pytest.skip("dependency conflict on CI", allow_module_level=True)
os.environ["TOKENIZERS_PARALLELISM"] = "false"
task_manager = tasks.TaskManager()
TEST_STRING = "foo bar"
class Test_SteeredModel:
from lm_eval.models.hf_steered import SteeredModel
torch.use_deterministic_algorithms(True)
task_list = task_manager.load_task_or_group(["arc_easy", "gsm8k", "wikitext"])
version_minor = sys.version_info.minor
multiple_choice_task = task_list["arc_easy"] # type: ignore
multiple_choice_task.build_all_requests(limit=10, rank=0, world_size=1)
MULTIPLE_CH: list[Instance] = multiple_choice_task.instances
generate_until_task = task_list["gsm8k"] # type: ignore
generate_until_task._config.generation_kwargs["max_gen_toks"] = 10
generate_until_task.set_fewshot_seed(1234) # fewshot random generator seed
generate_until_task.build_all_requests(limit=10, rank=0, world_size=1)
generate_until: list[Instance] = generate_until_task.instances
rolling_task = task_list["wikitext"] # type: ignore
rolling_task.build_all_requests(limit=10, rank=0, world_size=1)
ROLLING: list[Instance] = rolling_task.instances
MULTIPLE_CH_RES = [
-41.79737854003906,
-42.964412689208984,
-33.909732818603516,
-37.055198669433594,
-22.980390548706055,
-20.268718719482422,
-14.76205062866211,
-27.887500762939453,
-15.797225952148438,
-15.914306640625,
-13.01901626586914,
-18.053699493408203,
-13.33236312866211,
-13.35921859741211,
-12.12301254272461,
-11.86703109741211,
-47.02234649658203,
-47.69982147216797,
-36.420310974121094,
-50.065345764160156,
-16.742475509643555,
-18.542402267456055,
-26.460208892822266,
-20.307228088378906,
-17.686725616455078,
-21.752883911132812,
-33.17183303833008,
-39.21712112426758,
-14.78198528289795,
-16.775150299072266,
-11.49817180633545,
-15.404842376708984,
-13.141255378723145,
-15.870940208435059,
-15.29050064086914,
-12.36030387878418,
-44.557891845703125,
-55.43851089477539,
-52.66646194458008,
-56.289222717285156,
]
generate_until_RES = [
" The average of $2.50 each is $",
" A robe takes 2 bolts of blue fiber and half",
" $50,000 in repairs.\n\nQuestion",
" He runs 1 sprint 3 times a week.",
" They feed each of her chickens three cups of mixed",
" The price of the glasses is $5, but",
" The total percentage of students who said they like to",
" Carla is downloading a 200 GB file. Normally",
" John drives for 3 hours at a speed of 60",
" Eliza sells 4 tickets to 5 friends so she",
]
ROLLING_RES = [
-3604.61328125,
-19778.67626953125,
-8835.119384765625,
-27963.37841796875,
-7636.4351806640625,
-9491.43603515625,
-41047.35205078125,
-8396.804443359375,
-45966.24645996094,
-7159.05322265625,
]
LM = SteeredModel(
pretrained="EleutherAI/pythia-70m",
device="cpu",
dtype="float32",
steer_path="tests/testconfigs/sparsify_intervention.csv",
)
def test_load_with_sae_lens(self) -> None:
from lm_eval.models.hf_steered import SteeredModel
SteeredModel(
pretrained="EleutherAI/pythia-70m",
device="cpu",
dtype="float32",
steer_path="tests/testconfigs/sae_lens_intervention.csv",
)
assert True
def test_loglikelihood(self) -> None:
res = self.LM.loglikelihood(self.MULTIPLE_CH)
_RES, _res = self.MULTIPLE_CH_RES, [r[0] for r in res]
# log samples to CI
dir_path = Path("test_logs")
dir_path.mkdir(parents=True, exist_ok=True)
file_path = dir_path / f"outputs_log_{self.version_minor}.txt"
file_path = file_path.resolve()
with open(file_path, "w", encoding="utf-8") as f:
f.write("\n".join(str(x) for x in _res))
assert np.allclose(_res, _RES, atol=1e-2)
# check indices for Multiple Choice
argmax_RES, argmax_res = (
np.argmax(np.array(_RES).reshape(-1, 4), axis=1),
np.argmax(np.array(_res).reshape(-1, 4), axis=1),
)
assert (argmax_RES == argmax_res).all()
def test_generate_until(self) -> None:
res = self.LM.generate_until(self.generate_until)
assert res == self.generate_until_RES
def test_loglikelihood_rolling(self) -> None:
res = self.LM.loglikelihood_rolling(self.ROLLING)
assert np.allclose(res, self.ROLLING_RES, atol=1e-1)
def test_toc_encode(self) -> None:
res = self.LM.tok_encode(TEST_STRING)
assert res == [12110, 2534]
def test_toc_decode(self) -> None:
res = self.LM.tok_decode([12110, 2534])
assert res == TEST_STRING
def test_batch_encode(self) -> None:
res = self.LM.tok_batch_encode([TEST_STRING, "bar foo"])[0].tolist()
assert res == [[12110, 2534], [2009, 17374]]
def test_model_generate(self) -> None:
context = self.LM.tok_batch_encode([TEST_STRING])[0]
res = self.LM._model_generate(context, max_length=10, stop=["\n\n"])
res = self.LM.tok_decode(res[0])
assert res == "foo bar\n<bazhang> !info bar"
loader,hook_action,sparse_model,hookpoint,feature_index,steering_coefficient,sae_id,description,
sae_lens,add,gemma-scope-2b-pt-res-canonical,layers.20,12082,10.0,layer_20/width_16k/canonical,increase dogs,
loader,hook_action,sparse_model,hookpoint,feature_index,steering_coefficient,sae_id,description,
sparsify,add,EleutherAI/sae-pythia-70m-32k,layers.3,30,0.1,,,
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment