• Alex Hedges's avatar
    Set `cache_dir` for `evaluate.load()` in example scripts (#28422) · 95091e15
    Alex Hedges authored
    While using `run_clm.py`,[^1] I noticed that some files were being added
    to my global cache, not the local cache. I set the `cache_dir` parameter
    for the one call to `evaluate.load()`, which partially solved the
    problem. I figured that while I was fixing the one script upstream, I
    might as well fix the problem in all other example scripts that I could.
    
    There are still some files being added to my global cache, but this
    appears to be a bug in `evaluate` itself. This commit at least moves
    some of the files into the local cache, which is better than before.
    
    To create this PR, I made the following regex-based transformation:
    `evaluate\.load\((.*?)\)` -> `evaluate\.load\($1,
    cache_dir=model_args.cache_dir\)`. After using that, I manually fixed
    all modified files with `ruff` serving as useful guidance. During the
    process, I removed one existing usage of the `cache_dir` parameter in a
    script that did not have a corresponding `--cache-dir` argument
    declared.
    
    [^1]: I specifically used `pytorch/language-modeling/run_clm.py` from
    v4.34.1 of the library. For the original code, see the following URL:
    https://github.com/huggingface/transformers/tree/acc394c4f5e1283c19783581790b3dc3105a3697/examples/pytorch/language-modeling/run_clm.py.
    95091e15
run_classification.py 33.4 KB