• Ayush Thakur's avatar
    feat: Add Weights and Biases support (#1339) · 2683fbbb
    Ayush Thakur authored
    
    
    * add wandb as extra dependency
    
    * wandb metrics logging
    
    * refactor
    
    * log samples as tables
    
    * fix linter
    
    * refactor: put in a class
    
    * change dir
    
    * add panels
    
    * log eval as table
    
    * improve tables logging
    
    * improve reports logging
    
    * precommit run
    
    * ruff check
    
    * handle importing reports api gracefully
    
    * ruff
    
    * compare results
    
    * minor pre-commit fixes
    
    * build comparison report
    
    * ruff check
    
    * log results as artifacts
    
    * remove comparison script
    
    * update dependency
    
    * type annotate and docstring
    
    * add example
    
    * update readme
    
    * fix typo
    
    * teardown
    
    * handle outside wandb run
    
    * gracefully fail reports creation
    
    * precommit checks
    
    * add report url to summary
    
    * use wandb  printer for better url stdout
    
    * fix ruff
    
    * handle N/A and groups
    
    * fix eval table
    
    * remove unused var
    
    * update wandb version req + disable reports stdout
    
    * remove reports feature to TODO
    
    * add label to multi-choice question data
    
    * log model predictions
    
    * lints
    
    * loglikelihood_rolling
    
    * log eval result for groups
    
    * log tables by group for better handling
    
    * precommit
    
    * choices column for multi-choice
    
    * graciously fail wandb
    
    * remove reports feature
    
    * track system metrics + total eval time + stdout
    
    ---------
    Co-authored-by: default avatarLintang Sutawika <lintang@eleuther.ai>
    2683fbbb
logging_utils.py 14.1 KB