1. 14 Oct, 2021 1 commit
  2. 11 Oct, 2021 1 commit
  3. 08 Oct, 2021 1 commit
    • Stella Biderman's avatar
      Adds `PreTrainedModel.framework` attribute (#13817) · de344815
      Stella Biderman authored
      
      
      * Added `framework` attribute
      
      * Update modeling_utils.py
      
      * Update modeling_flax_utils.py
      
      * Update modeling_tf_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_tf_utils.py
      
      * Update modeling_tf_utils.py
      
      * Update modeling_flax_utils.py
      
      * Update modeling_tf_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_tf_utils.py
      
      * Update modeling_flax_utils.py
      
      * string -> str
      
      * Update modeling_tf_utils.py
      
      * string -> str
      
      * fixup
      
      * make flake happy
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      de344815
  4. 07 Oct, 2021 2 commits
  5. 05 Oct, 2021 1 commit
  6. 24 Sep, 2021 1 commit
    • Josh Devins's avatar
      Make assertions only if actually chunking forward (#13598) · 678bb248
      Josh Devins authored
      This moves the assertion on checking input dimensions into a block that will only be called if the function is actually going to do chunking forward. This is often not the case at inference time and PyTorch tracing a model with this assertion in it leads to a tracing warning.
      
      TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
        input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors
      678bb248
  7. 23 Sep, 2021 1 commit
    • Stas Bekman's avatar
      1x model size CPU memory usage for `from_pretrained` (#13466) · 62832c96
      Stas Bekman authored
      * one possible solution
      
      * low mem from_pretrained
      
      * edge cases
      
      * solve the persistent buffers
      
      * style
      
      * parametrize
      
      * for later
      
      * proper solution
      
      * cleanup
      
      * refactor; rework based on suggestions
      
      * revert splitting into 2 parts, move checks into main func
      62832c96
  8. 22 Sep, 2021 1 commit
  9. 17 Sep, 2021 1 commit
  10. 16 Sep, 2021 1 commit
  11. 15 Sep, 2021 1 commit
  12. 08 Sep, 2021 1 commit
  13. 30 Aug, 2021 1 commit
  14. 26 Aug, 2021 1 commit
  15. 06 Aug, 2021 1 commit
    • Sylvain Gugger's avatar
      Tpu tie weights (#13030) · 7fcee113
      Sylvain Gugger authored
      * Fix tied weights on TPU
      
      * Manually tie weights in no trainer examples
      
      * Fix for test
      
      * One last missing
      
      * Gettning owned by my scripts
      
      * Address review comments
      
      * Fix test
      
      * Fix tests
      
      * Fix reformer tests
      7fcee113
  16. 04 Aug, 2021 1 commit
  17. 17 Jul, 2021 1 commit
  18. 13 Jul, 2021 3 commits
  19. 08 Jul, 2021 1 commit
  20. 01 Jul, 2021 2 commits
  21. 29 Jun, 2021 1 commit
  22. 23 Jun, 2021 1 commit
  23. 14 Jun, 2021 1 commit
  24. 02 Jun, 2021 1 commit
  25. 12 May, 2021 2 commits
  26. 06 May, 2021 1 commit
  27. 05 May, 2021 1 commit
  28. 03 May, 2021 1 commit
  29. 30 Apr, 2021 1 commit
    • Stas Bekman's avatar
      [DeepSpeed] fp32 support (#11499) · 4e7bf94e
      Stas Bekman authored
      * prep for deepspeed==0.3.16
      
      * new version
      
      * too soon
      
      * support and test fp32 mode
      
      * troubleshooting doc start
      
      * workaround no longer needed
      
      * add fp32 doc
      
      * style
      
      * cleanup, add tf32 note
      
      * clarify
      
      * release was made
      4e7bf94e
  30. 26 Apr, 2021 2 commits
  31. 23 Apr, 2021 2 commits
    • Sylvain Gugger's avatar
      Trainer push to hub (#11328) · bf2e0cf7
      Sylvain Gugger authored
      
      
      * Initial support for upload to hub
      
      * push -> upload
      
      * Fixes + examples
      
      * Fix torchhub test
      
      * Torchhub test I hate you
      
      * push_model_to_hub -> push_to_hub
      
      * Apply mixin to other pretrained models
      
      * Remove ABC inheritance
      
      * Add tests
      
      * Typo
      
      * Run tests
      
      * Install git-lfs
      
      * Change approach
      
      * Add push_to_hub to all
      
      * Staging test suite
      
      * Typo
      
      * Maybe like this?
      
      * More deps
      
      * Cache
      
      * Adapt name
      
      * Quality
      
      * MOAR tests
      
      * Put it in testing_utils
      
      * Docs + torchhub last hope
      
      * Styling
      
      * Wrong method
      
      * Typos
      
      * Update src/transformers/file_utils.py
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Address review comments
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      bf2e0cf7
    • Patrick von Platen's avatar
      [Flax] Big FlaxBert Refactor (#11364) · 8c9b5fcb
      Patrick von Platen authored
      * improve flax
      
      * refactor
      
      * typos
      
      * Update src/transformers/modeling_flax_utils.py
      
      * Apply suggestions from code review
      
      * Update src/transformers/modeling_flax_utils.py
      
      * fix typo
      
      * improve error tolerance
      
      * typo
      
      * correct nasty saving bug
      
      * fix from pretrained
      
      * correct tree map
      
      * add note
      
      * correct weight tying
      8c9b5fcb
  32. 14 Apr, 2021 1 commit
  33. 08 Apr, 2021 1 commit
    • Stas Bekman's avatar
      [DeepSpeed] ZeRO Stage 3 (#10753) · c6d66484
      Stas Bekman authored
      
      
      * synced gpus
      
      * fix
      
      * fix
      
      * need to use t5-small for quality tests
      
      * notes
      
      * complete merge
      
      * fix a disappearing std stream problem
      
      * start zero3 tests
      
      * wip
      
      * tune params
      
      * sorting out the pre-trained model loading
      
      * reworking generate loop wip
      
      * wip
      
      * style
      
      * fix tests
      
      * split the tests
      
      * refactor tests
      
      * wip
      
      * parameterized
      
      * fix
      
      * workout the resume from non-ds checkpoint pass + test
      
      * cleanup
      
      * remove no longer needed code
      
      * split getter/setter functions
      
      * complete the docs
      
      * suggestions
      
      * gpus and their compute capabilities link
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * style
      
      * remove invalid paramgd
      
      * automatically configure zero3 params that rely on hidden size
      
      * make _get_resized_embeddings zero3-aware
      
      * add test exercising resize_token_embeddings()
      
      * add docstring
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      c6d66484