- 07 Feb, 2022 1 commit
-
-
Michael Benayoun authored
* Change the way tracing happens, enabling dynamic axes out of the box * Update the tests and modeling xlnet * Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors). * Comments and making tracing work for gpt-j and xlnet * Refactore things related to num_choices (and batch_size, sequence_length) * Update fx to work on PyTorch 1.10 * Postpone autowrap_function feature usage for later * Add copyrights * Remove unnecessary file * Fix issue with add_new_model_like * Apply suggestions
-
- 22 Sep, 2021 1 commit
-
-
Sylvain Gugger authored
* Make gradient_checkpointing a training argument * Update src/transformers/modeling_utils.py Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * Fix tests * Style * document Gradient Checkpointing as a performance feature * Small rename * PoC for not using the config * Adapt BC to new PoC * Forgot to save * Rollout changes to all other models * Fix typo Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Stas Bekman <stas@stason.org>
-
- 10 Sep, 2021 1 commit
-
-
Suraj Patil authored
* simplify local attention * update tests * add a comment and use torch.bitwise_xor
-
- 21 Jul, 2021 1 commit
-
-
Lysandre Debut authored
* Expose get_config() on ModelTesters * Typo
-
- 28 May, 2021 1 commit
-
-
Bhadresh Savani authored
* seq classification changes * fix tests
-
- 20 May, 2021 1 commit
-
-
Michael Benayoun authored
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures: - ALBERT - DistilBERT - MobileBERT - MegatronBERT - GPT2 - GPT Neo Co-authored-by:Michael Benayoun <michael@huggingface.co>
-
- 20 Apr, 2021 1 commit
-
-
Suraj Patil authored
* create local attention mask ones * remove old method, address patricks comment
-
- 06 Apr, 2021 1 commit
-
-
Suraj Patil authored
* better names * add attention mixin * all slow tests in one class * make helper methods static so we can test * add local attention tests * better names * doc * apply review suggestions
-
- 30 Mar, 2021 2 commits
-
-
Suraj Patil authored
* fix checkpoint names * auto model * fix doc
-
Suraj Patil authored
* lets begin * boom boom * fix out proj in attn * fix attention * fix local attention * add tokenizer * fix imports * autotokenizer * fix checkpoint name * cleanup * more clean-up * more cleanup * output attentions * fix attn mask creation * fix imports * config doc * add tests * add slow tests * quality * add conversion script * copyright * typo * another bites the dust * fix attention tests * doc * add embed init in convert function * fix copies * remove tokenizer * enable caching * address review comments * improve config and create attn layer list internally * more consistent naming * init hf config from mesh-tf config json file * remove neo tokenizer from doc * handle attention_mask in local attn layer * attn_layers => attention_layers * add tokenizer_class in config * fix docstring * raise if len of attention_layers is not same as num_layers * remove tokenizer_class from config * more consistent naming * fix doc * fix checkpoint names * fp16 compat * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-