1. 01 Sep, 2021 6 commits
  2. 31 Aug, 2021 6 commits
    • Stella Biderman's avatar
      GPT-J-6B (#13022) · c02cd95c
      Stella Biderman authored
      
      
      * Test GPTJ implementation
      
      * Fixed conflicts
      
      * Update __init__.py
      
      * Update __init__.py
      
      * change GPT_J to GPTJ
      
      * fix missing imports and typos
      
      * use einops for now
      (need to change to torch ops later)
      
      * Use torch ops instead of einsum
      
      * remove einops deps
      
      * Update configuration_auto.py
      
      * Added GPT J
      
      * Update gptj.rst
      
      * Update __init__.py
      
      * Update test_modeling_gptj.py
      
      * Added GPT J
      
      * Changed configs to match GPT2 instead of GPT Neo
      
      * Removed non-existent sequence model
      
      * Update configuration_auto.py
      
      * Update configuration_auto.py
      
      * Update configuration_auto.py
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * Progress on updating configs to agree with GPT2
      
      * Update modeling_gptj.py
      
      * num_layers -> n_layer
      
      * layer_norm_eps -> layer_norm_epsilon
      
      * attention_layers -> num_hidden_layers
      
      * Update modeling_gptj.py
      
      * attention_pdrop -> attn_pdrop
      
      * hidden_act -> activation_function
      
      * Update configuration_gptj.py
      
      * Update configuration_gptj.py
      
      * Update configuration_gptj.py
      
      * Update configuration_gptj.py
      
      * Update configuration_gptj.py
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * fix layernorm and lm_head size
      delete attn_type
      
      * Update docs/source/model_doc/gptj.rst
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * removed claim that GPT J uses local attention
      
      * Removed GPTJForSequenceClassification
      
      * Update src/transformers/models/gptj/configuration_gptj.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Removed unsupported boilerplate
      
      * Update tests/test_modeling_gptj.py
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update tests/test_modeling_gptj.py
      Co-authored-by: default avatarEric Hallahan <eric@hallahans.name>
      
      * Update tests/test_modeling_gptj.py
      Co-authored-by: default avatarEric Hallahan <eric@hallahans.name>
      
      * Update tests/test_modeling_gptj.py
      Co-authored-by: default avatarEric Hallahan <eric@hallahans.name>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Update __init__.py
      
      * Update configuration_gptj.py
      
      * Update modeling_gptj.py
      
      * Corrected indentation
      
      * Remove stray backslash
      
      * Delete .DS_Store
      
      * Delete .DS_Store
      
      * Delete .DS_Store
      
      * Delete .DS_Store
      
      * Delete .DS_Store
      
      * Update docs to match
      
      * Remove tf loading
      
      * Remove config.jax
      
      * Remove stray `else:` statement
      
      * Remove references to `load_tf_weights_in_gptj`
      
      * Adapt tests to match output from GPT-J 6B
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Default `activation_function` to `gelu_new`
      
      - Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()`
      
      * Fix part of the config documentation
      
      * Revert "Update configuration_auto.py"
      
      This reverts commit e9860e9c043b6ebf57a0e705044e9ec9ba2263bb.
      
      * Revert "Update configuration_auto.py"
      
      This reverts commit cfaaae4c4dc70f1fbe9abd60fc8bd0b863b8c011.
      
      * Revert "Update configuration_auto.py"
      
      This reverts commit 687788954fd0cfbc567fa1202d56a4ff9271944f.
      
      * Revert "Update configuration_auto.py"
      
      This reverts commit 194d024ea87d4fcef0dcb08e57f52c47511a9fc6.
      
      * Hyphenate GPT-J
      
      * Undid sorting of the models alphabetically
      
      * Reverting previous commit
      
      * fix style and quality issues
      
      * Update docs/source/model_doc/gptj.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/__init__.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update tests/test_modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/__init__.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/configuration_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/configuration_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/configuration_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Replaced GPTJ-specific code with generic code
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Made the code always use rotary positional encodings
      
      * Update index.rst
      
      * Fix documentation
      
      * Combine attention classes
      
      - Condense all attention operations into `GPTJAttention`
      - Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout`
      
      * Removed `config.rotary_dim` from tests
      
      * Update test_modeling_gptj.py
      
      * Update test_modeling_gptj.py
      
      * Fix formatting
      
      * Removed depreciated argument `layer_id` to `GPTJAttention`
      
      * Update modeling_gptj.py
      
      * Update modeling_gptj.py
      
      * Fix code quality
      
      * Restore model functionality
      
      * Save `lm_head.weight` in checkpoints
      
      * Fix crashes when loading with reduced precision
      
      * refactor self._attn(...)` and rename layer weights"
      
      * make sure logits are in fp32 for sampling
      
      * improve docs
      
      * Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist
      
      * Added GPT-J to the README
      
      * Fix doc/readme consistency
      
      * Add rough parallelization support
      
      - Remove unused imports and variables
      - Clean up docstrings
      - Port experimental parallelization code from GPT-2 into GPT-J
      
      * Clean up loose ends
      
      * Fix index.rst
      Co-authored-by: default avatarkurumuz <kurumuz1@gmail.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarEric Hallahan <eric@hallahans.name>
      Co-authored-by: default avatarLeo Gao <54557097+leogao2@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: your_github_username <your_github_email>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      c02cd95c
    • Matt's avatar
      TF/Numpy variants for all DataCollator classes (#13105) · 854260ca
      Matt authored
      
      
      * Adding a TF variant of the DataCollatorForTokenClassification to get feedback
      
      * Added a Numpy variant and a post_init check to fail early if a missing import is found
      
      * Fixed call to Numpy variant
      
      * Added a couple more of the collators
      
      * Update src/transformers/data/data_collator.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fixes, style pass, finished DataCollatorForSeqToSeq
      
      * Added all the LanguageModeling DataCollators, except SOP and PermutationLanguageModeling
      
      * Adding DataCollatorForPermutationLanguageModeling
      
      * Style pass
      
      * Add missing `__call__` for PLM
      
      * Remove `post_init` checks for frameworks because the imports inside them were making us fail code quality checks
      
      * Remove unused imports
      
      * First attempt at some TF tests
      
      * A second attempt to make any of those tests actually work
      
      * TF tests, round three
      
      * TF tests, round four
      
      * TF tests, round five
      
      * TF tests, all enabled!
      
      * Style pass
      
      * Merging tests into `test_data_collator.py`
      
      * Merging tests into `test_data_collator.py`
      
      * Fixing up test imports
      
      * Fixing up test imports
      
      * Trying shuffling the conditionals around
      
      * Commenting out non-functional old tests
      
      * Completed all tests for all three frameworks
      
      * Style pass
      
      * Fixed test typo
      
      * Style pass
      
      * Move standard `__call__` method to mixin
      
      * Rearranged imports for `test_data_collator`
      
      * Fix data collator typo "torch" -> "pt"
      
      * Fixed the most embarrassingly obvious bug
      
      * Update src/transformers/data/data_collator.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Renaming mixin
      
      * Updating docs
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarDalton Walker <dalton_walker@icloud.com>
      Co-authored-by: default avatarAndrew Romans <andrew.romans@hotmail.com>
      854260ca
    • Sylvain Gugger's avatar
      Clean up test file · 74b3344f
      Sylvain Gugger authored
      74b3344f
    • Kamal Raj's avatar
      Deberta_v2 tf (#13120) · 3efcfeab
      Kamal Raj authored
      * Deberta_v2 tf
      
      * added new line at the end of file, make style
      
      * +V2, typo
      
      * remove never executed branch of code
      
      * rm cmnt and fixed typo in url filter
      
      * cleanup according to review comments
      
      * added #Copied from
      3efcfeab
    • tucan9389's avatar
      Add GPT2ForTokenClassification (#13290) · 41c55941
      tucan9389 authored
      
      
      * Add GPT2ForTokenClassification
      
      * Fix dropout exception for GPT2 NER
      
      * Remove sequence label in test
      
      * Change TokenClassifierOutput to TokenClassifierOutputWithPast
      
      * Fix for black formatter
      
      * Remove dummy
      
      * Update docs for GPT2ForTokenClassification
      
      * Fix check_inits ci fail
      
      * Update dummy_pt_objects after make fix-copies
      
      * Remove TokenClassifierOutputWithPast
      
      * Fix tuple input issue
      Co-authored-by: default avatardanielsejong55@gmail.com <danielsejong55@gmail.com>
      41c55941
    • Sylvain Gugger's avatar
      Tests fetcher tests (#13340) · 8b2de0e4
      Sylvain Gugger authored
      * Incorporate tests dependencies in tests_fetcher
      
      * Harder modif
      
      * Debug
      
      * Loop through all files
      
      * Last modules
      
      * Remove debug statement
      8b2de0e4
  3. 30 Aug, 2021 8 commits
    • Olatunji Ruwase's avatar
      Use DS callable API to allow hf_scheduler + ds_optimizer (#13216) · 42f359d0
      Olatunji Ruwase authored
      
      
      * Use DS callable API to allow hf_scheduler + ds_optimizer
      
      * Preserve backward-compatibility
      
      * Restore backward compatibility
      
      * Tweak arg positioning
      
      * Tweak arg positioning
      
      * bump the required version
      
      * Undo indent
      
      * Update src/transformers/trainer.py
      
      * style
      Co-authored-by: default avatarStas Bekman <stas@stason.org>
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      42f359d0
    • Laura Hanu's avatar
      Add missing module __spec__ (#13321) · 35236b87
      Laura Hanu authored
      * added missing __spec__ to _LazyModule
      
      * test __spec__ is not None after module import
      
      * changed module_spec arg to be optional in _LazyModule
      
      * fix style issue
      
      * added module spec test to test_file_utils
      35236b87
    • Sylvain Gugger's avatar
      Fix AutoTokenizer when no fast tokenizer is available (#13336) · c4ecd234
      Sylvain Gugger authored
      * Fix AutoTokenizer when a tokenizer has no fast version
      
      * Add test
      c4ecd234
    • Kamal Raj's avatar
      albert flax (#13294) · 98e409ab
      Kamal Raj authored
      * albert flax
      
      * year -> 2021
      
      * docstring updated for flax
      
      * removed head_mask
      
      * removed from_pt
      
      * removed passing attention_mask to embedding layer
      98e409ab
    • Kamal Raj's avatar
      distilbert-flax (#13324) · 774760e6
      Kamal Raj authored
      * distilbert-flax
      
      * added missing self
      
      * docs fix
      
      * removed tied kernal extra init
      
      * updated docs
      
      * x -> hidden states
      
      * removed head_mask
      
      * removed from_pt, +FLAX
      
      * updated year
      774760e6
    • NielsRogge's avatar
      Add LayoutLMv2 + LayoutXLM (#12604) · b6ddb08a
      NielsRogge authored
      
      
      * First commit
      
      * Make style
      
      * Fix dummy objects
      
      * Add Detectron2 config
      
      * Add LayoutLMv2 pooler
      
      * More improvements, add documentation
      
      * More improvements
      
      * Add model tests
      
      * Add clarification regarding image input
      
      * Improve integration test
      
      * Fix bug
      
      * Fix another bug
      
      * Fix another bug
      
      * Fix another bug
      
      * More improvements
      
      * Make more tests pass
      
      * Make more tests pass
      
      * Improve integration test
      
      * Remove gradient checkpointing and add head masking
      
      * Add integration test
      
      * Add LayoutLMv2ForSequenceClassification to the tests
      
      * Add LayoutLMv2ForQuestionAnswering
      
      * More improvements
      
      * More improvements
      
      * Small improvements
      
      * Fix _LazyModule
      
      * Fix fast tokenizer
      
      * Move sync_batch_norm to a separate method
      
      * Replace dummies by requires_backends
      
      * Move calculation of visual bounding boxes to separate method + update README
      
      * Add models to main init
      
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Remove is_split_into_words
      
      * More improvements
      
      * Simply tesseract - no use of pandas anymore
      
      * Add LayoutLMv2Processor
      
      * Update is_pytesseract_available
      
      * Fix bugs
      
      * Improve feature extractor
      
      * Fix bug
      
      * Add print statement
      
      * Add truncation of bounding boxes
      
      * Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer
      
      * Improve tokenizer tests
      
      * Make more tokenizer tests pass
      
      * Make more tests pass, add integration tests
      
      * Finish integration tests
      
      * More improvements
      
      * More improvements - update API of the tokenizer
      
      * More improvements
      
      * Remove support for VQA training
      
      * Remove some files
      
      * Improve feature extractor
      
      * Improve documentation and one more tokenizer test
      
      * Make quality and small docs improvements
      
      * Add batched tests for LayoutLMv2Processor, remove fast tokenizer
      
      * Add truncation of labels
      
      * Apply suggestions from code review
      
      * Improve processor tests
      
      * Fix failing tests and add suggestion from code review
      
      * Fix tokenizer test
      
      * Add detectron2 CI job
      
      * Simplify CI job
      
      * Comment out non-detectron2 jobs and specify number of processes
      
      * Add pip install torchvision
      
      * Add durations to see which tests are slow
      
      * Fix tokenizer test and make model tests smaller
      
      * Frist draft
      
      * Use setattr
      
      * Possible fix
      
      * Proposal with configuration
      
      * First draft of fast tokenizer
      
      * More improvements
      
      * Enable fast tokenizer tests
      
      * Make more tests pass
      
      * Make more tests pass
      
      * More improvements
      
      * Addd padding to fast tokenizer
      
      * Mkae more tests pass
      
      * Make more tests pass
      
      * Make all tests pass for fast tokenizer
      
      * Make fast tokenizer support overflowing boxes and labels
      
      * Add support for overflowing_labels to slow tokenizer
      
      * Add support for fast tokenizer to the processor
      
      * Update processor tests for both slow and fast tokenizers
      
      * Add head models to model mappings
      
      * Make style & quality
      
      * Remove Detectron2 config file
      
      * Add configurable option to label all subwords
      
      * Fix test
      
      * Skip visual segment embeddings in test
      
      * Use ResNet-18 backbone in tests instead of ResNet-101
      
      * Proposal
      
      * Re-enable all jobs on CI
      
      * Fix installation of tesseract
      
      * Fix failing test
      
      * Fix index table
      
      * Add LayoutXLM doc page, first draft of code examples
      
      * Improve documentation a lot
      
      * Update expected boxes for Tesseract 4.0.0 beta
      
      * Use offsets to create labels instead of checking if they start with ##
      
      * Update expected boxes for Tesseract 4.1.1
      
      * Fix conflict
      
      * Make variable names cleaner, add docstring, add link to notebooks
      
      * Revert "Fix conflict"
      
      This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.
      
      * Revert to make integration test pass
      
      * Apply suggestions from @LysandreJik's review
      
      * Address @patrickvonplaten's comments
      
      * Remove fixtures DocVQA in favor of dataset on the hub
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      b6ddb08a
    • Patrick von Platen's avatar
      [Slow tests] Disable Wav2Vec2 pretraining test for now (#13303) · a75db353
      Patrick von Platen authored
      
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * wav2vec2 pretraining
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      a75db353
    • Patrick von Platen's avatar
      correct (#13304) · 4362ee29
      Patrick von Platen authored
      4362ee29
  4. 27 Aug, 2021 7 commits
  5. 26 Aug, 2021 9 commits
  6. 25 Aug, 2021 2 commits
  7. 24 Aug, 2021 1 commit
  8. 23 Aug, 2021 1 commit
    • Yih-Dar's avatar
      Make Flax GPT2 working with cross attention (#13008) · 2e20c0f3
      Yih-Dar authored
      
      
      * make flax gpt2 working with cross attention
      
      * Remove encoder->decoder projection layer
      
      * A draft (incomplete) for FlaxEncoderDecoderModel
      
      * Add the method from_encoder_decoder_pretrained + the docstrings
      
      * Fix the mistakes of using EncoderDecoderModel
      
      * Fix style
      
      * Add FlaxEncoderDecoderModel to the library
      
      * Fix cyclic imports
      
      * Add FlaxEncoderDecoderModel to modeling_flax_auto.py
      
      * Remove question comments
      
      * add tests for FlaxEncoderDecoderModel
      
      * add flax_encoder_decoder to the lists of ignored entries in check_repo.py
      
      * fix missing required positional arguments
      
      * Remove **kwargs when creating FlaxEncoderDecoderModel in from_encoder_decoder_pretrained()
      
      Also fix generation eos/pad tokens issue
      
      * Fix: Use sequences from the generated_output
      
      * Change a check from assert to raise ValueError
      
      * Fix examples and token ids issues
      
      * Fix missing all_cross_attentions when outputting tuple in modeling_gpt2
      
      * Remove the changes in configuration docstrings.
      
      * allow for bert 2 gpt2
      
      * make fix-copies
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Change remaining examples to bert2gpt2
      
      * Change the test to Bert2GPT2
      
      * Fix examples
      
      * Fix import
      
      * Fix unpack bug
      
      * Rename to FlaxEncoderDecoderModelTest and change the test to bert2gpt2
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Fix: NotImplentedError -> NotImplementedError
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * up
      
      * finalize
      Co-authored-by: default avatarydshieh <ydshieh@user.noreply>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      2e20c0f3