Commits · 0fe17f375a4f0fdd9aea260d0645ccfd4896e958 · chenpangpang / transformers

07 Feb, 2022 1 commit

FX tracing improvement (#14321) · 0fe17f37

Michael Benayoun authored Feb 07, 2022

* Change the way tracing happens, enabling dynamic axes out of the box

* Update the tests and modeling xlnet

* Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).

* Comments and making tracing work for gpt-j and xlnet

* Refactore things related to num_choices (and batch_size, sequence_length)

* Update fx to work on PyTorch 1.10

* Postpone autowrap_function feature usage for later

* Add copyrights

* Remove unnecessary file

* Fix issue with add_new_model_like

* Apply suggestions

0fe17f37

22 Sep, 2021 1 commit

Make gradient_checkpointing a training argument (#13657) · 27d46397

Sylvain Gugger authored Sep 22, 2021



* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

27d46397

10 Sep, 2021 1 commit
- [GPT-Neo] Simplify local attention (#13491) · 010965dc
  Suraj Patil authored Sep 10, 2021
```
* simplify local attention

* update tests

* add a comment and use torch.bitwise_xor
```
  010965dc
21 Jul, 2021 1 commit
- Expose get_config() on ModelTesters (#12812) · c3d9ac76
  Lysandre Debut authored Jul 21, 2021
```
* Expose get_config() on ModelTesters

* Typo
```
  c3d9ac76
28 May, 2021 1 commit
- Added Sequence Classification class in GPTNeo (#11906) · e1205e47
  Bhadresh Savani authored May 28, 2021
```
* seq classification changes

* fix tests
```
  e1205e47
20 May, 2021 1 commit

A cleaner and more scalable implementation of symbolic tracing (#11763) · f4a0d6ff

Michael Benayoun authored May 20, 2021



Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo
Co-authored-by: Michael Benayoun <michael@huggingface.co>

f4a0d6ff

20 Apr, 2021 1 commit
- [GPTNeo] create local attention mask ones (#11335) · cfd2eaa8
  Suraj Patil authored Apr 20, 2021
```
* create local attention mask ones

* remove old method, address patricks comment
```
  cfd2eaa8
06 Apr, 2021 1 commit

[WIP] GPT Neo cleanup (#10985) · 2a8115f0

Suraj Patil authored Apr 06, 2021

* better names

* add attention mixin

* all slow tests in one class

* make helper methods static so we can test

* add local attention tests

* better names

* doc

* apply review suggestions

2a8115f0

30 Mar, 2021 2 commits

GPT Neo few fixes (#10968) · 83d38c9f
Suraj Patil authored Mar 30, 2021
```
* fix checkpoint names

* auto model

* fix doc
```
83d38c9f

GPT Neo (#10848) · 86026437

Suraj Patil authored Mar 30, 2021



* lets begin

* boom boom

* fix out proj in attn

* fix attention

* fix local attention

* add tokenizer

* fix imports

* autotokenizer

* fix checkpoint name

* cleanup

* more clean-up

* more cleanup

* output attentions

* fix attn mask creation

* fix imports

* config doc

* add tests

* add slow tests

* quality

* add conversion script

* copyright

* typo

* another bites the dust

* fix attention tests

* doc

* add embed init in convert function

* fix copies

* remove tokenizer

* enable caching

* address review comments

* improve config and create attn layer list internally

* more consistent naming

* init hf config from mesh-tf config json file

* remove neo tokenizer from doc

* handle attention_mask in local attn layer

* attn_layers => attention_layers

* add tokenizer_class in config

* fix docstring

* raise if len of attention_layers is not same as num_layers

* remove tokenizer_class from config

* more consistent naming

* fix doc

* fix checkpoint names

* fp16 compat

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

86026437