1. 12 Apr, 2021 2 commits
    • Philipp Schmid's avatar
      Sagemaker test docs update for framework upgrade (#11206) · f243a5ec
      Philipp Schmid authored
      * increased train_runtime for model parallelism
      
      * added documentation for framework upgrade
      f243a5ec
    • NielsRogge's avatar
      Add DeiT (PyTorch) (#11056) · 9f126097
      NielsRogge authored
      * First draft of deit
      
      * More improvements
      
      * Remove DeiTTokenizerFast from init
      
      * Conversion script works
      
      * Add DeiT to ViT conversion script
      
      * Add tests, add head model, add support for deit in vit conversion script
      
      * Update model checkpoint names
      
      * Update image_mean and image_std, set resample to bicubic
      
      * Improve docs
      
      * Docs improvements
      
      * Add DeiTForImageClassificationWithTeacher to init
      
      * Address comments by @sgugger
      
      * Improve feature extractors
      
      * Make fix-copies
      
      * Minor fixes
      
      * Address comments by @patil-suraj
      
      * All models uploaded
      
      * Fix tests
      
      * Remove labels argument from DeiTForImageClassificationWithTeacher
      
      * Fix-copies, style and quality
      
      * Fix tests
      
      * Fix typo
      
      * Multiple docs improvements
      
      * More docs fixes
      9f126097
  2. 09 Apr, 2021 4 commits
  3. 08 Apr, 2021 6 commits
  4. 07 Apr, 2021 3 commits
  5. 06 Apr, 2021 3 commits
  6. 05 Apr, 2021 5 commits
  7. 01 Apr, 2021 2 commits
  8. 31 Mar, 2021 4 commits
  9. 30 Mar, 2021 6 commits
    • Suraj Patil's avatar
      GPT Neo few fixes (#10968) · 83d38c9f
      Suraj Patil authored
      * fix checkpoint names
      
      * auto model
      
      * fix doc
      83d38c9f
    • Patrick von Platen's avatar
      fix big bird gpu test (#10967) · 7772ddb4
      Patrick von Platen authored
      7772ddb4
    • Suraj Patil's avatar
      GPT Neo (#10848) · 86026437
      Suraj Patil authored
      
      
      * lets begin
      
      * boom boom
      
      * fix out proj in attn
      
      * fix attention
      
      * fix local attention
      
      * add tokenizer
      
      * fix imports
      
      * autotokenizer
      
      * fix checkpoint name
      
      * cleanup
      
      * more clean-up
      
      * more cleanup
      
      * output attentions
      
      * fix attn mask creation
      
      * fix imports
      
      * config doc
      
      * add tests
      
      * add slow tests
      
      * quality
      
      * add conversion script
      
      * copyright
      
      * typo
      
      * another bites the dust
      
      * fix attention tests
      
      * doc
      
      * add embed init in convert function
      
      * fix copies
      
      * remove tokenizer
      
      * enable caching
      
      * address review comments
      
      * improve config and create attn layer list internally
      
      * more consistent naming
      
      * init hf config from mesh-tf config json file
      
      * remove neo tokenizer from doc
      
      * handle attention_mask in local attn layer
      
      * attn_layers => attention_layers
      
      * add tokenizer_class in config
      
      * fix docstring
      
      * raise if len of attention_layers is not same as num_layers
      
      * remove tokenizer_class from config
      
      * more consistent naming
      
      * fix doc
      
      * fix checkpoint names
      
      * fp16 compat
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      86026437
    • Patrick von Platen's avatar
      [WIP][Flax] Add general conversion script (#10809) · 8780caa3
      Patrick von Platen authored
      
      
      * save intermediate
      
      * finish first version
      
      * delete some more
      
      * improve import
      
      * fix roberta
      
      * Update src/transformers/modeling_flax_pytorch_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/modeling_flax_pytorch_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * small corrections
      
      * apply all comments
      
      * fix deterministic
      
      * make fix-copies
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      8780caa3
    • Philipp Schmid's avatar
      Sagemaker test (#10925) · 604c0850
      Philipp Schmid authored
      * init
      
      * first working test
      
      * added todo for setup.py
      
      * working test for single node multi node ddp and smd
      
      * added tensorflow single node test
      
      * added directory for pytorch and tensorflow due to different requirements.txt
      
      * added directory for pytorch and tensorflow
      
      * added comment for run_glue until it is available
      
      * added output_dir to it
      
      * smaller dataset to make test running faster
      
      * adjust HP and script
      
      * adjusted parameter for tensorflow
      
      * refactored test scripts
      
      * adjusted make file
      
      * init
      
      * first working test
      
      * added todo for setup.py
      
      * working test for single node multi node ddp and smd
      
      * added tensorflow single node test
      
      * added directory for pytorch and tensorflow due to different requirements.txt
      
      * added directory for pytorch and tensorflow
      
      * added comment for run_glue until it is available
      
      * added output_dir to it
      
      * smaller dataset to make test running faster
      
      * adjust HP and script
      
      * adjusted parameter for tensorflow
      
      * refactored test scripts
      
      * adjusted make file
      
      * updated dlc container
      
      * commented in all tests
      
      * added both ecr images
      
      * added new master branches
      
      * debug
      
      * added new datasets version
      
      * init
      
      * strange rebase bug
      
      * removed changes
      
      * changed min version for tests to work
      
      * updated DLC
      
      * added model parallel test
      
      * removed test files
      
      * removed test files
      
      * tested with ned dlc
      
      * added correct sagemaker sdk version
      
      * adjust DLCs for official one
      
      * reworked tests
      
      * quality
      
      * removed default profile added documentation to it
      
      * added step in release for sagemaker tests
      
      * reverted version for example script removed duplicated script and added install from master to requirements.txt
      
      * removed mistaken .DS_Stores from mac
      
      * fixed tests
      
      * added Sylvains feedback
      
      * make style
      
      * added lysandre's feedback
      604c0850
    • Vasudev Gupta's avatar
      BigBird (#10183) · 6dfd0272
      Vasudev Gupta authored
      
      
      * init bigbird
      
      * model.__init__ working, conversion script ready, config updated
      
      * add conversion script
      
      * BigBirdEmbeddings working :)
      
      * slightly update conversion script
      
      * BigBirdAttention working :) ; some bug in layer.output.dense
      
      * add debugger-notebook
      
      * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast
      
      * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)
      
      * BigBirdModel working in block-sparse attention mode :)
      
      * add BigBirdForPreTraining
      
      * small fix
      
      * add tokenizer for BigBirdModel
      
      * fix config & hence modeling
      
      * fix base prefix
      
      * init testing
      
      * init tokenizer test
      
      * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements
      
      * remove position_embedding_type arg
      
      * complete normal tests
      
      * add comments to block sparse attention
      
      * add attn_probs for sliding & global tokens
      
      * create fn for block sparse attn mask creation
      
      * add special tests
      
      * restore pos embed arg
      
      * minor fix
      
      * attn probs update
      
      * make big bird fully gpu friendly
      
      * fix tests
      
      * remove pruning
      
      * correct tokenzier & minor fixes
      
      * update conversion script , remove norm_type
      
      * tokenizer-inference test add
      
      * remove extra comments
      
      * add docs
      
      * save intermediate
      
      * finish trivia_qa conversion
      
      * small update to forward
      
      * correct qa and layer
      
      * better error message
      
      * BigBird QA ready
      
      * fix rebased
      
      * add triva-qa debugger notebook
      
      * qa setup
      
      * fixed till embeddings
      
      * some issue in q/k/v_layer
      
      * fix bug in conversion-script
      
      * fixed till self-attn
      
      * qa fixed except layer norm
      
      * add qa end2end test
      
      * fix gradient ckpting ; other qa test
      
      * speed-up big bird a bit
      
      * hub_id=google
      
      * clean up
      
      * make quality
      
      * speed up einsum with bmm
      
      * finish perf improvements for big bird
      
      * remove wav2vec2 tok
      
      * fix tokenizer
      
      * include docs
      
      * correct docs
      
      * add helper to auto pad block size
      
      * make style
      
      * remove fast tokenizer for now
      
      * fix some
      
      * add pad test
      
      * finish
      
      * fix some bugs
      
      * fix another bug
      
      * fix buffer tokens
      
      * fix comment and merge from master
      
      * add comments
      
      * make style
      
      * commit some suggestions
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fix typos
      
      * fix some more suggestions
      
      * add another patch
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix copies
      
      * another path
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * update
      
      * update nit suggestions
      
      * make style
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      6dfd0272
  10. 26 Mar, 2021 1 commit
  11. 25 Mar, 2021 1 commit
    • Amir Tahmasbi's avatar
      Layout lm tf 2 (#10636) · 4684bfc7
      Amir Tahmasbi authored
      
      
      * Added embeddings layer
      
      * Added layoutlm layers, main model, maskedlm and token classification classes
      
      * Added model classes to tf auto models
      
      * Added model to PT to TF conversion script
      
      * Added model to doc README
      
      * Added tests
      
      * Removed unused imports
      
      * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
      
      * Made tests pass!
      
      * Fixed typos in imports and docs
      
      * Fixed a typo in embeddings layer
      
      * Removed imports
      
      * Fixed formatting issues, imports, tests
      
      * Added layoutlm layers, main model, maskedlm and token classification classes
      
      * Added model classes to tf auto models
      
      * Added model to PT to TF conversion script
      
      * Removed unused imports
      
      * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
      
      * Made tests pass!
      
      * Fixed typos in imports and docs
      
      * Removed imports
      
      * Fixed small formatting issues
      
      * Removed duplicates import from main __init__.py
      
      * Chnaged deafult arg to true for adding  pooling layer to tf layoutlm
      
      * Fixed formatting issues
      
      * Style
      
      * Added copied from to classes copied from bert
      
      * Fixed doc strings examples to work with layoutlm inputs
      
      * Removed PyTorch reference in doc strings example
      
      * Added integration tests
      
      * Cleaned up initialization file
      
      * Updated model checkpoint identifiers
      
      * Fixed imports
      Co-authored-by: default avatarAmir Tahmasbi <amir@ehsai.ca>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      4684bfc7
  12. 23 Mar, 2021 1 commit
  13. 22 Mar, 2021 1 commit
  14. 19 Mar, 2021 1 commit