Commits · b7bb2b59f72504fbabe3de24c84b5e282c4870e8 · chenpangpang / transformers

23 Jan, 2023 1 commit
- Generate: save generation config with the models' `.save_pretrained()` (#21264) · 1eda4a41
  Joao Gante authored Jan 23, 2023
  
  1eda4a41
26 Oct, 2022 1 commit

Allow flax subfolder (#19902) · 6d023270

Patrick von Platen authored Oct 26, 2022

* add first generation tutorial

* [Flax] Add subfolder functionality

* [Flax] Add subfolder functionality

* up

* finish

* delete file and re-add test

6d023270

09 Sep, 2022 1 commit
- [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* (#18361) · e6f221c8
  Sanchit Gandhi authored Sep 09, 2022
```
* [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_*

* fix double tree_util
```
  e6f221c8
12 Aug, 2022 1 commit

Load sharded pt to flax (#18419) · bce36ee0

Arthur authored Aug 12, 2022



* initial commit

* add small test

* add cross pt tf flag to test

* fix quality

* style

* update test with new repo

* fix failing test

* update

* fix wrong param ordering

* style

* update based on review

* update related to recent new caching mechanism

* quality

* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* quality and style

* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

bce36ee0

01 Aug, 2022 1 commit

Rewrite push_to_hub to use upload_files (#18366) · 01db72ab

Sylvain Gugger authored Aug 01, 2022

* Rewrite push_to_hub to use upload_files

* Adapt the doc a bit

* Address review comments and clean doc

01db72ab

01 Jul, 2022 1 commit

[Flax] Add remat (gradient checkpointing) (#17843) · 485bbe79

Sanchit Gandhi authored Jul 01, 2022

* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* make fix-copies

* fix big-bird, electra, roberta

* cookie-cutter

* fix flax big-bird

* move test to common

485bbe79

22 Jun, 2022 1 commit
- Flax sharded (#17760) · 16c6eb7c
  Arthur authored Jun 22, 2022
  
  16c6eb7c
21 Jun, 2022 2 commits

Use 5e-5 For BigBird PT/Flax equivalence tests (#17780) · f47afefb

Yih-Dar authored Jun 21, 2022



* rename to check_pt_flax_outputs

* update check_pt_flax_outputs

* use 5e-5 for BigBird PT/Flax test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f47afefb

Prepare transformers for v0.8.0 huggingface-hub release (#17716) · 6a5272b2

Lysandre Debut authored Jun 21, 2022



* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.

* Test rc3

* Test latest rc

* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

6a5272b2

19 Apr, 2022 1 commit

[Flax] improve large model init and loading (#16148) · d3bd9ac7

Suraj Patil authored Apr 19, 2022



* begin do_init

* add params_shape_tree

* raise error if params are accessed when do_init is False

* don't allow do_init=False when keys are missing

* make shape tree a property

* assign self._params at the end

* add test for do_init

* add do_init arg to all flax models

* fix param setting

* disbale do_init for composite models

* update test

* add do_init in FlaxBigBirdForMultipleChoice

* better names and errors

* improve test

* style

* add a warning when do_init=False

* remove extra if

* set params after _required_params

* add test for from_pretrained

* do_init => _do_init

* chage warning to info

* fix typo

* add params in init_weights

* add params to gpt neo init

* add params to init_weights

* update do_init test

* Trigger CI

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update template

* trigger CI

* style

* style

* fix template
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d3bd9ac7

29 Mar, 2022 1 commit

Fix missing output_attentions in PT/Flax equivalence test (#16271) · aebca696

Yih-Dar authored Mar 29, 2022



* fix - set output_attentions to True

* Update tests/test_modeling_flax_common.py

* update for has_attentions

* overwrite check_outputs in FlaxBigBirdModelTest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

aebca696

18 Mar, 2022 1 commit

Make Flax pt-flax equivalence test more aggressive (#15841) · d481b641

Yih-Dar authored Mar 18, 2022



* Make test_equivalence_pt_to_flax more aggressive

* Make test_equivalence_flax_to_pt more aggressive

* don't use to_tuple

* clean-up

* fix missing test cases + testing on GPU

* fix conversion

* fix `ValueError: assignment destination is read-only`

* Add type checking

* commit to revert later

* Fix

* fix

* fix device

* better naming

* clean-up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d481b641

09 Feb, 2022 1 commit
- [Flax tests] fix test_model_outputs_equivalence (#15571) · a6885db9
  Suraj Patil authored Feb 09, 2022
```
* fix test_model_outputs_equivalence

* fix tuple outputs for blenderbot
```
  a6885db9
20 Dec, 2021 1 commit

Add a main_input_name attribute to all models (#14803) · 33f36c86

Sylvain Gugger authored Dec 20, 2021



* Add a main_input_name attribute to all models

* Fix tests

* Wtf Vs Code?

* Update src/transformers/models/imagegpt/modeling_imagegpt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

* Fix copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

33f36c86

17 Dec, 2021 1 commit

Implement head_mask for Flax BERT and other models copied from BERT (#14620) · ff066119

Daniel Stancl authored Dec 17, 2021

* Implement head_mask for Flax BERT and other models copied from BERT

* Remove `from jax._src.nn.functions import sigmoid`

Remove `from jax._src.nn.functions import sigmoid` unintentionally added by IDE

* Remove no more valid copy statement

* Apply patil-suraj's suggestions from code review

* Apply suggestions from the code review

* Update Flax template

* Fix a typo

* Also update template for CausalLM modules

ff066119

11 Nov, 2021 2 commits

fix loading flax bf16 weights in pt (#14369) · 3d607df8

Suraj Patil authored Nov 11, 2021



* fix loading flax bf16 weights in pt

* fix clip test

* fix t5 test

* add logging statement

* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* switch back to native any

* fix check for bf16 weights
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

3d607df8

Fix Flax params dtype (#13098) · e92190c0

Suraj Patil authored Nov 11, 2021



* fix inits

* fix embed dtype

* fix embed dtype

* add test to check default dtype

* quality

* add type conversion methods for flax models

* more robust casting

* cast sinusoidal positions

* update pegasus

* update albert

* update test

* make sure dtype is passed to every module

* style

* fix electra dense

* fix t5

* quality

* add more tests

* better name

* use the dtype for lm head computation

* fix albert

* style

* fix albert embed dtype

* more tests

* fix vision enc-dec

* cleanup

* fix embed dtype pegasus

* fix default param test

* doc

* update template

* fix final_logits_bias dtype

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix doc

* fix doc

* add detailed docstring for dtype parameter

* remove un-necessary import
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

e92190c0

02 Nov, 2021 1 commit
- Update Transformers to huggingface_hub >= 0.1.0 (#14251) · 558f8543
  Sylvain Gugger authored Nov 02, 2021
```
* Update Transformers to huggingface_hub >= 0.1.0

* Forgot to save...

* Style

* Fix test
```
  558f8543
21 Oct, 2021 1 commit

Fix ignore_mismatched_sizes (#14085) · 234cfefb

Li-Huai (Allan) Lin authored Oct 22, 2021

* Fix

* Style

* Name

* Fix tests

* Style

* Remove embed sizes checking

* Disable some tests

* Fix

* Apply suggestion

234cfefb

12 Aug, 2021 1 commit
- [Flax/JAX] Run jitted tests at every commit (#13090) · 6900dded
  Patrick von Platen authored Aug 12, 2021
```
* up

* up

* up
```
  6900dded
05 Aug, 2021 1 commit

[Flax] Correct pt to flax conversion if from base to head (#13006) · 60e448c8

Patrick von Platen authored Aug 05, 2021

* finish PR

* add tests

* correct tests

* finish

* correct other flax tests

* better naming

* correct naming

* finish

* apply sylvains suggestions

60e448c8

04 Aug, 2021 1 commit
- [Flax] Align jax flax device name (#12987) · da9754a3
  Patrick von Platen authored Aug 04, 2021
```
* [Flax] Align device name in docs

* make style

* fix import error
```
  da9754a3
13 Jul, 2021 1 commit

Add option to load a pretrained model with mismatched shapes (#12664) · 90178b0c

Sylvain Gugger authored Jul 13, 2021



* Add option to load a pretrained model with mismatched shapes

* Fail at loading when mismatched shapes in Flax

* Fix tests

* Update src/transformers/modeling_flax_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

90178b0c

23 Jun, 2021 1 commit

Clean push to hub API (#12187) · 53c60bab

Sylvain Gugger authored Jun 23, 2021



* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

53c60bab

21 Jun, 2021 2 commits

[Flax] Fix flax test save pretrained (#12256) · 4e9a6796
Patrick von Platen authored Jun 21, 2021
```
* fix_torch_device_generate_test

* remove @

* fix flax save pretrained test
```
4e9a6796

[Flax] [WIP] allow loading head model with base model weights (#12255) · eb881674

Suraj Patil authored Jun 21, 2021

* boom boom

* remove flax clip example

* allow loading head model with base model weights

* add test

* fix imports

* disable save, load test for clip

* add test_save_load_to_base

eb881674

14 Jun, 2021 3 commits

Flax Big Bird (#11967) · d9c0d08f

Vasudev Gupta authored Jun 15, 2021



* add flax bert

* bert -> bigbird

* original_full ported

* add debugger

* init block sparse

* fix copies ; gelu_fast -> gelu_new

* block sparse port

* fix block sparse

* block sparse working

* all ckpts working

* fix-copies

* make quality

* init tests

* temporary fix for FlaxBigBirdForMultipleChoice

* skip test_attention_outputs

* fix

* gelu_fast -> gelu_new ; fix multiple choice model

* remove nsp

* fix sequence classifier

* fix

* make quality

* make fix-copies

* finish

* Delete debugger.ipynb

* Update src/transformers/models/big_bird/modeling_flax_big_bird.py

* make style

* finish

* bye bye jit flax tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d9c0d08f

[Flax] Fix flax pt equivalence tests (#12154) · 007be9e4
Patrick von Platen authored Jun 14, 2021
```
* fix_torch_device_generate_test

* remove @

* upload
```
007be9e4

FlaxBart (#11537) · 4a51b1dd

Daniel Stancl authored Jun 14, 2021



* Start working on FlaxBart

* Create modeling_flax_bart.py

* Write FlaxBartAttention

* Add FlaxBartEncoderLayer

* Add FlaxBartDecoderLayer and some typing

* Add helepr function for FlaxBart

* shift_tokens_right

* _make_causal_mask

* _expand_mask

* Add PositionalEmbedding and fix init_std naming

* Add FlaxBartPretrainedModel

* Add FlaxBartEncoder

* Add FlaxBartEncoder

* Add FlaxBartEncoder among modules to be imported

* YET WE CANNOT INITIALIZE THAT!! :(

* Make BartEncoder working

Change BartEncoder to instance of nn.Module so far

* Add FlaxBartDecoder

* Add FlaxBartModel

* TODO to make model run -> Prepapre model inputs

* Resolve padding

* Add FlaxBartModel

* Add FlaxBartModel into importable modules

* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules

* make style; not properly working

* make style; make quality not pass due to some import I left

* Remove TODO for padding_idx in nn.Embed so far

* Add FlaxBartForConditionalGeneration

* Incorporate Flax model output classes, i.e. return_dict

* Add another models and incorporate use_cache arg

* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering

* Incorporate use_cache arg from PyTorch implementation

* Add all necessary Flax output utils

* Add FlaxBartForCausalLM; not working yet'

* Add minor improvements; still lacks some functionality

* Update docs, src and tests

* Add support of FlaxBart to docs/source

* Fix some bugs in FlaxBart souce code

* Add some neccessary tests for FlaxBart models - jit_compilation not passing

* Fix tests and add test_head_masking

* Fix tests for @jax.jit computation

* Add test_head_masking

* Migrate FlaxBart tests from jax.numpy to numpy

* Remove FlaxBartForCausalLM

* Clean repo

* fix bart model weight structure

* Fix FlaxBartForSequenceClassification

Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.

* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence

* Allow testing for FlaxBartForQA for pt_flax equivalence

* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6

* remove past_key_values

* remove inputs_mebeds and make input_ids required

* add position ids

* re-write attention layer

* fix dataclass

* fix pos embeds and attention output

* fix pos embeds

* expose encode method

* expose decode method

* move docstring to top

* add cache for causal attn layer

* remove head masking for now

* s2s greedy search first pass

* boom boom

* fix typos

* fix greedy generate for bart

* use encoder, decoder layers instead of num_hidden_layers

* handle encoder_outputs

* cleanup

* simplify decoding

* more clean-up

* typos

* Change header + add {decoder_,}position_ids into 2 models

* add BartConfig

* fix existing tests

* add encode, decode methods

* Fix shift_tokens_right for JIT compilation + clarify one condition

* fix decode

* encoder => encode

* simplify generate

* add tests for encode and decode

* style

* add tests for cache

* fix equivalence tests

* sample generate now works with seq2seq

* generation tests

* initialize dense layers

* docstring and cleanup

* quality

* remove get/set input_embeddings

* address Patricks suggestions

* decode for every model, remove encoder_outputs from call

* update tests accordingly

* decode returns only decoder outputs and logits

* fix arguments

* doc encode, decode methods

* correct base_model_prefix

* fix test for seq classif model

* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

4a51b1dd

01 Jun, 2021 1 commit

Add FlaxCLIP (#11883) · ad25fd62

Suraj Patil authored Jun 01, 2021

* add flax CLIP

* default input_shape

* add tests

* fix test

* fix name

* fix docs

* fix shapes

* attend at least 1 token

* flax conv to torch conv

* return floats

* fix equivalence tests

* fix import

* return attention_weights and update tests

* fix dosctrings

* address patricks comments

* input_shape arg

* add tests for get_image_features and get_text_features methods

* fix tests

ad25fd62

28 May, 2021 1 commit

[Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918) · af1a10bf

Jayendra authored May 28, 2021



* Added logic to return attention from flax-bert model and added test cases to check that

* Added new line at the end of file to test_modeling_flax_common.py

* fixing code style

* Fixing Roberta and Elextra models too from cpoying bert

* Added temporary hack to not run test_attention_outputs for FlaxGPT2

* Returning attention weights from GPT2 and changed the tests accordingly.

* last fixes

* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

af1a10bf

26 May, 2021 1 commit

[Flax] Allow dataclasses to be jitted (#11886) · d5a72b6e

Patrick von Platen authored May 26, 2021

* fix_torch_device_generate_test

* remove @

* change dataclasses to flax ones

* fix typo

* fix jitted tests

* fix bert & electra

d5a72b6e

18 May, 2021 1 commit

FlaxGPT2 (#11556) · ca33278f

Suraj Patil authored May 19, 2021



* flax gpt2

* combine masks

* handle shared embeds

* add causal LM sample

* style

* add tests

* style

* fix imports, docs, quality

* don't use cache

* add cache

* add cache 1st version

* make use cache work

* start adding test for generation

* finish generation loop compilation

* rewrite test

* finish

* update

* update

* apply sylvains suggestions

* update

* refactor

* fix typo
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

ca33278f

04 May, 2021 1 commit

[FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470) · 084a187d

Patrick von Platen authored May 04, 2021



* add flax roberta

* make style

* correct initialiazation

* modify model to save weights

* fix copied from

* fix copied from

* correct some more code

* add more roberta models

* Apply suggestions from code review

* merge from master

* finish

* finish docs
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

084a187d

29 Apr, 2021 1 commit

[Flax] Add docstrings & model outputs (#11498) · f748bd42

Patrick von Platen authored Apr 29, 2021



* add attentions & hidden states

* add model outputs + docs

* finish docs

* finish tests

* finish impl

* del @

* finish

* finish

* correct test

* apply sylvains suggestions

* Update src/transformers/models/bert/modeling_flax_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* simplify more
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f748bd42

23 Apr, 2021 1 commit

[Flax] Big FlaxBert Refactor (#11364) · 8c9b5fcb

Patrick von Platen authored Apr 23, 2021

* improve flax

* refactor

* typos

* Update src/transformers/modeling_flax_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_flax_utils.py

* fix typo

* improve error tolerance

* typo

* correct nasty saving bug

* fix from pretrained

* correct tree map

* add note

* correct weight tying

8c9b5fcb

31 Mar, 2021 1 commit

[Flax] Add other BERT classes (#10977) · e87505f3

Patrick von Platen authored Mar 31, 2021

* add first code structures

* add all bert models

* add to init and docs

* correct docs

* make style

e87505f3

30 Mar, 2021 1 commit

[WIP][Flax] Add general conversion script (#10809) · 8780caa3

Patrick von Platen authored Mar 30, 2021



* save intermediate

* finish first version

* delete some more

* improve import

* fix roberta

* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small corrections

* apply all comments

* fix deterministic

* make fix-copies
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

8780caa3

18 Mar, 2021 1 commit

[Flax] Adapt Flax models to new structure (#9484) · 0b98ca36

Patrick von Platen authored Mar 18, 2021



* Create modeling_flax_eletra with code copied from modeling_flax_bert

* Add ElectraForMaskedLM and ElectraForPretraining

* Add modeling test for Flax electra and fix naming and arg in Flax Electra model

* Add documentation

* Fix code style

* Create modeling_flax_eletra with code copied from modeling_flax_bert

* Add ElectraForMaskedLM and ElectraForPretraining

* Add modeling test for Flax electra and fix naming and arg in Flax Electra model

* Add documentation

* Fix code style

* Fix code quality

* Adjust tol in assert_almost_equal due to very small difference between model output, ranging 0.0010 - 0.0016

* Remove redundant ElectraPooler

* save intermediate

* adapt

* correct bert flax design

* adapt roberta as well

* finish roberta flax

* finish

* apply suggestions

* apply suggestions
Co-authored-by: Chris Nguyen <anhtu2687@gmail.com>

0b98ca36

16 Mar, 2021 1 commit

Flax testing should not run the full torch test suite (#10725) · 9f8619c6

Patrick von Platen authored Mar 16, 2021

* make flax tests pytorch independent

* fix typo

* finish

* improve circle ci

* fix return tensors

* correct flax test

* re-add sentencepiece

* last tokenizer fixes

* finish maybe now

9f8619c6