1. 23 Feb, 2023 3 commits
  2. 22 Feb, 2023 12 commits
  3. 21 Feb, 2023 8 commits
  4. 20 Feb, 2023 9 commits
    • Sylvain Gugger's avatar
      8b3db33a
    • Arthur's avatar
      Fix-rag-finetune-project-requirement (#21697) · 4194e5f4
      Arthur authored
      pin pytorch lightning requirement
      4194e5f4
    • Alara Dirik's avatar
      Add EfficientNet (#21563) · 49ab1623
      Alara Dirik authored
      * Add EfficientNet to transformers
      49ab1623
    • Younes Belkada's avatar
      [`bnb`] fix `bnb` decoders bug (#21688) · c9a06714
      Younes Belkada authored
      * fix `bnb` decoders bug
      
      * make fixup
      c9a06714
    • tanreinama's avatar
      add GPTSAN model (reopen) (#21291) · f56174ac
      tanreinama authored
      * add GPTSAN-Japanese
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN (update for review)
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * fix typo in comment text
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * fix document and comments
      
      * fix class name GPTSAN->GPTSan
      
      * fix import and test for tokenizer
      f56174ac
    • Sylvain Gugger's avatar
      Fix quality · c87bbe1f
      Sylvain Gugger authored
      c87bbe1f
    • Morgan McGuire's avatar
      Fix for non-contiguous label tensors in VisonEncoderDecoder (#21582) · 011cc17a
      Morgan McGuire authored
      * add prints
      
      * add shape
      
      * add reshape
      
      * clean up
      011cc17a
    • Andy Ehrenberg's avatar
      add flax whisper implementation (#20479) · 2840272c
      Andy Ehrenberg authored
      
      
      * add flax whisper implementation
      
      * rever change to setup
      
      * remove unused imports
      
      * revert generation changes
      
      * flax whisper docs
      
      * docs
      
      * import order
      
      * import sorting
      
      * isort
      
      * add dummy objects
      
      * doc formatting
      
      * formatting
      
      * remove trailing whitespaces
      
      * fix flax whisper docs
      
      * add generation logic to unlock flax whisper
      
      * remove scans
      
      * give credits to Flax Bart implementation
      
      * remove unused imports
      
      * add license
      
      * remove assert
      
      * more credits to Bart
      
      * fix style
      
      * formatting
      
      * support left padding
      
      * add flax whisper generation test
      
      * remove copied from comments whenever not a full copy
      
      * fix docstrings for logits processors
      
      * revert change to FlaxForceTokensLogitsProcessor
      
      * revert doc changes
      
      * improve generation docs
      
      * reorganize
      
      * formatting
      
      * cleanup docs
      
      * add tests
      
      * handle empty list case
      
      * fix forced decoder ids in flax tests
      
      * add flax whisper to inits
      
      * upate dummy objects
      
      * docs for FlaxAutoModelForSpeechSeq2Seq
      
      * fix decoder_position_ids computation in pretrained model decode/__call__ fns
      
      * add Copied from statements as necessary
      
      * compute position_ids only in __call__ and decode methods of pretrained model subclasses
      
      * improve readabilityof compute positional embeddings
      
      * check dimensionality of input_features instead of hidden_states
      
      * copied from statement for init_cache
      
      * formatting
      
      * fix copies
      
      * fix copies
      
      * pass attention mask to encoder layers
      
      * fix decoder module outputs
      
      * set dtype
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * smaller flax model for whisper test
      
      * Update src/transformers/generation/flax_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/whisper/modeling_flax_whisper.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update tests/models/whisper/test_modeling_flax_whisper.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * cleanup
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/whisper/modeling_flax_whisper.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * bias cleanup
      
      * doc fix
      
      * align style for force tokens processor
      
      * readability
      
      * fix input shape in tests
      
      * revert FlaxGenerationMixin docstring
      
      * formatting
      
      * fix tests
      
      * fix imports
      
      * consistent encoder hidden states
      
      * consistent hidden states
      
      * input shapes
      
      * typo
      
      * partial class trick
      
      * partial class for input shape
      
      * base_class with correct input shape
      
      * partial base classes
      
      * match by name
      
      * set main_input_name
      
      * compare on names
      
      * formatting
      
      * remove unused import
      
      * safer position ids computation
      
      * safer position id computation
      
      * Update src/transformers/models/whisper/modeling_flax_whisper.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/whisper/modeling_flax_whisper.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * remove identical inherited tests
      
      * fix prompt ids in tests
      
      * use generation config
      
      * use jnp array
      
      * better var names
      
      * more explicit bias use
      
      * import transformers
      
      * formatting
      
      * test formatting
      
      * remove unused imports
      
      * remove unused imports
      
      * formatting
      
      * isort
      
      * docs
      
      * fix ln orders for encoder hidden states
      
      * whisper unique generation stuff
      
      * flake
      
      * use finfo for attention bias
      
      * docs
      
      * Update src/transformers/generation/flax_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * docs
      
      * add timestamp flax test
      
      * jit for timestamps
      
      * formatting
      
      * clean up timestamps processor
      
      * formatting
      
      * remove if_true
      
      * cleanup
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      2840272c
    • AlexWertheim's avatar
      Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) (#21406) · 7735e040
      AlexWertheim authored
      
      
      * Reinserted import statement accidentally removed during rebasing.
      
      * Added auto_wrap functionality, restructured XLA FSDP logic to more closely match PyTorch FSDP logic.
      
      * Fixed flag descriptions; changed several instances of fsdp_ to xla_fsdp_; pass in auto_wrap_policy and auto_wrapper_callable directly to avoid lambda saving.
      
      * Moved XLA FSDP logic to be adjacent to Fairscale FSDP logic in trainer.
      
      * Formatted changes in accordance with HF style requirements.
      
      * Added back in warning which was accidentally removed.
      
      * - Merged XLA FSDP training arguments into `fsdp_config`
      - Added `xla` boolean flag to `fsdp_config` to specify XLA FSDP wrapping
      - Merged XLA FSDP wrapping logic into FSDP wrapping logic within trainer
        class
      
      * Cleaned up errors, moved argument to fsdp_config
      
      - Set `xla` and `xla_fsdp_grad_ckpt` flags by default in fsdp_config
      - Added missing colons following conditionals
      - Moved `fsdp_transformer_layer_cls_to_wrap` to `fsdp_config`
      - Modified `fsdp_transformer_layer_cls_to_wrap` to be list of strings,
        not just one string
      - Changed Fairscale FSDP logic to allow for set of layer classes to wrap
      - Removed unnecessary checks for `xla_fsdp`
      
      * Corrected small errors, improved layer class flag
      
      - Correctly set default values for `xla` and `xla_fsdp_grad_ckpt`
        arguments
      - Made `fsdp_transformer_layer_cls_to_wrap` a list of strings instead of
        a single string
      - Added processing to ensure that `fsdp_transformer_layer_cls_to_wrap`
        works as expected if passed as a single string
      - Updated PyTorch FSDP logic to accept a list of layers to wrap, as done
        with XLA FSDP
      - Replaced instances of `getattr()` with `.get()` for dictionary
        retrievals with default values, including when setting
        `fsdp_min_num_params`
      - Corrected `self.fsdp is not None` to `len(self.fsdp) > 0`
      - Removed extraneous `xla_fsdp` argument descriptions from outside
        `fsdp_config`
      
      * Changed xla-fsdp-settings to be dictionary
      
      - Modified xla-fsdp-settings to be entered directly as dictionary
        instead of loaded through JSON file
      - Made small style corrections
      
      * Reverted unintentional local_rank TPU check
      
      * Do not block XLA FSDP if local rank is -1
      
      * Rebased and applied automatic formatting
      
      - Rebased
      - Applied automatic formatting changes via `make style`
      
      * Applied automatic formatting with latest version of black
      
      * Replaced  expression with
      
      * Reran black examples tests src utils
      ruff examples tests src utils --fix
      make autogenerate_code
      make[1]: Entering directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers'
      make[1]: Leaving directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers' after additional formatting changes
      
      * Additionall automatic formatting changes
      
      * Remove unnecessary whitespace characters from src/transformers/training_args.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      7735e040
  5. 17 Feb, 2023 7 commits
  6. 16 Feb, 2023 1 commit