feat: Add Weights and Biases support (#1339)
* add wandb as extra dependency
* wandb metrics logging
* refactor
* log samples as tables
* fix linter
* refactor: put in a class
* change dir
* add panels
* log eval as table
* improve tables logging
* improve reports logging
* precommit run
* ruff check
* handle importing reports api gracefully
* ruff
* compare results
* minor pre-commit fixes
* build comparison report
* ruff check
* log results as artifacts
* remove comparison script
* update dependency
* type annotate and docstring
* add example
* update readme
* fix typo
* teardown
* handle outside wandb run
* gracefully fail reports creation
* precommit checks
* add report url to summary
* use wandb printer for better url stdout
* fix ruff
* handle N/A and groups
* fix eval table
* remove unused var
* update wandb version req + disable reports stdout
* remove reports feature to TODO
* add label to multi-choice question data
* log model predictions
* lints
* loglikelihood_rolling
* log eval result for groups
* log tables by group for better handling
* precommit
* choices column for multi-choice
* graciously fail wandb
* remove reports feature
* track system metrics + total eval time + stdout
---------
Co-authored-by:
Lintang Sutawika <lintang@eleuther.ai>
Showing
lm_eval/logging_utils.py
0 → 100644
Please register or sign in to comment