1. 15 Jun, 2021 5 commits
  2. 14 Jun, 2021 19 commits
    • Stas Bekman's avatar
      04028317
    • Stas Bekman's avatar
      [style] consistent nn. and nn.functional: part 4 `examples` (#12156) · 88e84186
      Stas Bekman authored
      * consistent nn. and nn.functional: p4 examples
      
      * restore
      88e84186
    • Stas Bekman's avatar
      [style] consistent nn. and nn.functional: part 3 `tests` (#12155) · 372ab9cd
      Stas Bekman authored
      * consistent nn. and nn.functional: p3 templates
      
      * restore
      372ab9cd
    • Vasudev Gupta's avatar
      Flax Big Bird (#11967) · d9c0d08f
      Vasudev Gupta authored
      
      
      * add flax bert
      
      * bert -> bigbird
      
      * original_full ported
      
      * add debugger
      
      * init block sparse
      
      * fix copies ; gelu_fast -> gelu_new
      
      * block sparse port
      
      * fix block sparse
      
      * block sparse working
      
      * all ckpts working
      
      * fix-copies
      
      * make quality
      
      * init tests
      
      * temporary fix for FlaxBigBirdForMultipleChoice
      
      * skip test_attention_outputs
      
      * fix
      
      * gelu_fast -> gelu_new ; fix multiple choice model
      
      * remove nsp
      
      * fix sequence classifier
      
      * fix
      
      * make quality
      
      * make fix-copies
      
      * finish
      
      * Delete debugger.ipynb
      
      * Update src/transformers/models/big_bird/modeling_flax_big_bird.py
      
      * make style
      
      * finish
      
      * bye bye jit flax tests
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d9c0d08f
    • Stas Bekman's avatar
      a156da9a
    • Patrick von Platen's avatar
      [Flax] Fix flax pt equivalence tests (#12154) · 007be9e4
      Patrick von Platen authored
      * fix_torch_device_generate_test
      
      * remove @
      
      * upload
      007be9e4
    • Will Rice's avatar
      Adding TFWav2Vec2Model (#11617) · d438eee0
      Will Rice authored
      
      
      * [WIP] Add TFWav2Vec2Model
      
      Work in progress for adding a tensorflow version of Wav2Vec2
      
      * feedback changes
      
      * small fix
      
      * Test Feedback Round 1
      
      * Add SpecAugment and CTC Loss
      
      * correct spec augment mask creation
      
      * docstring and correct copyright
      
      * correct bugs
      
      * remove bogus file
      
      * finish tests correction
      
      * del unnecessary layers
      
      * Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make style
      
      * correct final bug
      
      * Feedback Changes
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d438eee0
    • Stas Bekman's avatar
      [style] consistent nn. and nn.functional (#12124) · 1ed2ebf6
      Stas Bekman authored
      * consistent nn. and nn.functional
      
      * fix glitch
      
      * fix glitch #2
      1ed2ebf6
    • Stas Bekman's avatar
      [optim] implement AdafactorSchedule (#12123) · ff7c8168
      Stas Bekman authored
      
      
      * implement AdafactorSchedule
      
      * typo
      
      * fix
      
      * Update src/transformers/optimization.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ff7c8168
    • Suraj Patil's avatar
      fix error message (#12148) · fe357648
      Suraj Patil authored
      fe357648
    • Kumar Abhishek's avatar
      [lm examples] Replicate --config_overrides addition to other LM examples (#12135) · 9de62cfb
      Kumar Abhishek authored
      
      
      * [lm examples] Replicate --config_overrides addition to other LM examples
      
      * Removing no trainer files changes
      
      * Update README
      Co-authored-by: default avatarKumar Abhishek <kabhishek@expedia.com>
      9de62cfb
    • Nicholas Broad's avatar
      Use text_column_name variable instead of "text" (#12132) · cd7961b6
      Nicholas Broad authored
      
      
      * Use text_column_name variable instead of "text"
      
      `text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
      
      This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
      
      * black formatting
      
      * make style
      Co-authored-by: default avatarNicholas Broad <nicholas@nmbroad.com>
      cd7961b6
    • Sylvain Gugger's avatar
      Don't log anything before logging is setup in examples (#12121) · b8ab5413
      Sylvain Gugger authored
      * Don't log anything before logging is setup in examples
      
      * Last example
      b8ab5413
    • Patrick von Platen's avatar
      [Flax] Add links to google colabs (#12146) · 7566fefa
      Patrick von Platen authored
      * fix_torch_device_generate_test
      
      * remove @
      
      * add colab links
      7566fefa
    • SaulLu's avatar
      Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810) · 476ba679
      SaulLu authored
      
      
      * feature for tokenizer without slow/legacy version
      
      * format
      
      * modify common test
      
      * add tests
      
      * add PreTrainedTokenizerFast to AutoTokenizer
      
      * format
      
      * change tokenizer common test in order to be able to run test without a slow version
      
      * update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
      
      * add autokenizer test
      
      * replace  `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
      
      * remove obsolete change in comment
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/tokenization_utils_fast.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * change `get_main_tokenizer` into `get_tokenizers`
      
      * clarify `get_tokenizers` method
      
      * homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
      
      * add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
      
      * `test_rust_tokenizer = False` for BertJapaneseTokenizer
      
      * `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      476ba679
    • Daniel Stancl's avatar
      FlaxBart (#11537) · 4a51b1dd
      Daniel Stancl authored
      
      
      * Start working on FlaxBart
      
      * Create modeling_flax_bart.py
      
      * Write FlaxBartAttention
      
      * Add FlaxBartEncoderLayer
      
      * Add FlaxBartDecoderLayer and some typing
      
      * Add helepr function for FlaxBart
      
      * shift_tokens_right
      
      * _make_causal_mask
      
      * _expand_mask
      
      * Add PositionalEmbedding and fix init_std naming
      
      * Add FlaxBartPretrainedModel
      
      * Add FlaxBartEncoder
      
      * Add FlaxBartEncoder
      
      * Add FlaxBartEncoder among modules to be imported
      
      * YET WE CANNOT INITIALIZE THAT!! :(
      
      * Make BartEncoder working
      
      Change BartEncoder to instance of nn.Module so far
      
      * Add FlaxBartDecoder
      
      * Add FlaxBartModel
      
      * TODO to make model run -> Prepapre model inputs
      
      * Resolve padding
      
      * Add FlaxBartModel
      
      * Add FlaxBartModel into importable modules
      
      * Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
      
      * make style; not properly working
      
      * make style; make quality not pass due to some import I left
      
      * Remove TODO for padding_idx in nn.Embed so far
      
      * Add FlaxBartForConditionalGeneration
      
      * Incorporate Flax model output classes, i.e. return_dict
      
      * Add another models and incorporate use_cache arg
      
      * Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
      
      * Incorporate use_cache arg from PyTorch implementation
      
      * Add all necessary Flax output utils
      
      * Add FlaxBartForCausalLM; not working yet'
      
      * Add minor improvements; still lacks some functionality
      
      * Update docs, src and tests
      
      * Add support of FlaxBart to docs/source
      
      * Fix some bugs in FlaxBart souce code
      
      * Add some neccessary tests for FlaxBart models - jit_compilation not passing
      
      * Fix tests and add test_head_masking
      
      * Fix tests for @jax.jit computation
      
      * Add test_head_masking
      
      * Migrate FlaxBart tests from jax.numpy to numpy
      
      * Remove FlaxBartForCausalLM
      
      * Clean repo
      
      * fix bart model weight structure
      
      * Fix FlaxBartForSequenceClassification
      
      Slicing is not possible to use below jit, therefore, selecting sentence
      representation from hidden_states must be changed.
      
      * Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
      
      * Allow testing for FlaxBartForQA for pt_flax equivalence
      
      * Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
      
      * remove past_key_values
      
      * remove inputs_mebeds and make input_ids required
      
      * add position ids
      
      * re-write attention layer
      
      * fix dataclass
      
      * fix pos embeds and attention output
      
      * fix pos embeds
      
      * expose encode method
      
      * expose decode method
      
      * move docstring to top
      
      * add cache for causal attn layer
      
      * remove head masking for now
      
      * s2s greedy search first pass
      
      * boom boom
      
      * fix typos
      
      * fix greedy generate for bart
      
      * use encoder, decoder layers instead of num_hidden_layers
      
      * handle encoder_outputs
      
      * cleanup
      
      * simplify decoding
      
      * more clean-up
      
      * typos
      
      * Change header + add {decoder_,}position_ids into 2 models
      
      * add BartConfig
      
      * fix existing tests
      
      * add encode, decode methods
      
      * Fix shift_tokens_right for JIT compilation + clarify one condition
      
      * fix decode
      
      * encoder => encode
      
      * simplify generate
      
      * add tests for encode and decode
      
      * style
      
      * add tests for cache
      
      * fix equivalence tests
      
      * sample generate now works with seq2seq
      
      * generation tests
      
      * initialize dense layers
      
      * docstring and cleanup
      
      * quality
      
      * remove get/set input_embeddings
      
      * address Patricks suggestions
      
      * decode for every model, remove encoder_outputs from call
      
      * update tests accordingly
      
      * decode returns only decoder outputs and logits
      
      * fix arguments
      
      * doc encode, decode methods
      
      * correct base_model_prefix
      
      * fix test for seq classif model
      
      * fix docs
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      4a51b1dd
    • Suraj Patil's avatar
      add readme for flax clm (#12111) · d36fce82
      Suraj Patil authored
      
      
      * add readme for flax clm
      
      * use section link for tokenizer
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * update metrics
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d36fce82
    • Patrick von Platen's avatar
      Add mlm pretraining xla torch readme (#12011) · 16c0efca
      Patrick von Platen authored
      
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * upload
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Update examples/flax/language-modeling/README.md
      
      * add more info
      
      * finish
      
      * fix
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      16c0efca
    • Guido Novati's avatar
      Fix megatron_gpt2 attention block's causal mask (#12007) · ecd6efe7
      Guido Novati authored
      
      
      * Fix megatron_gpt2 attention block's causal mask.
      
      * compatibility with checkpoints created with recent versions of Megatron-LM
      
      * added integration test for the released Megatron-GPT2 model
      
      * code style changes
      
      * added option to megatron conversion script to read from config file
      Co-authored-by: default avatarGuido Novati <gnovati@nvidia.com>
      ecd6efe7
  3. 13 Jun, 2021 1 commit
  4. 11 Jun, 2021 3 commits
  5. 10 Jun, 2021 10 commits
  6. 09 Jun, 2021 2 commits