1. 04 May, 2023 1 commit
  2. 03 May, 2023 1 commit
  3. 02 May, 2023 1 commit
  4. 01 May, 2023 1 commit
  5. 28 Apr, 2023 1 commit
  6. 27 Apr, 2023 2 commits
  7. 24 Apr, 2023 1 commit
  8. 23 Apr, 2023 1 commit
  9. 14 Apr, 2023 1 commit
  10. 12 Apr, 2023 1 commit
    • pioliverse's avatar
      add model resources for CPMAnt (new) (#20906) · 523ca4e0
      pioliverse authored
      
      
      * resolve conflicts
      
      * rebase and make style
      
      * test
      
      * test
      
      * test
      
      * rebase and make style
      
      * rebase and make style
      
      * tests
      
      * tests
      
      * rewrite some functions
      
      * rebase and make style
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * fix some bugs & docstring
      
      * add models and tests
      
      * solve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * tests
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * fix some bugs & docstring
      
      * save resolution
      
      * make style
      
      * delete redefinition code
      
      * reformat function
      
      * reformat
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * tests
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * resolve conflicts
      
      * make style
      
      * fix bugs and refactor
      
      * modify docstrings and make style
      
      * unify import format in __init__.py
      
      * fix import-altclp bug
      
      * fix copies to update index.md
      
      * fix unused config parameters
      
      * fix unused config parameters
      
      * fix unused config parameters
      
      * update README_ja.md
      
      * dummy commit for unit test
      
      * fix attention mask
      
      * add CPMAntTokenizer&-Fast to auto-mapping
      
      * drop redundant changes in README_ko
      
      * fix  defaults in docstring
      
      * fix use_cache and some docstring
      
      * add missing args in tokenizer
      
      * modify tester inheritance
      
      * add is_jieba_available
      
      * fix some bugs
      
      * make style and fix-copies
      
      * add doctests
      
      * skip integration tests
      
      * add is_jieba_available
      
      * fix bugs in common tests
      
      * adjust docstrings and make style
      
      * add argument docstring
      
      * adjust code to some specifications
      
      * make style and fix-copies
      
      * add fast tokenization test
      
      * dummy commit for unit test
      
      * dummy commit for unit test
      
      * dummy commit for unit test
      
      * normalize some comments and names
      
      * Bert->CPMAnt
      
      * camel names and drop redundant codes
      
      * make style and fix-coies
      
      * add CpmTokenizerFast _import_structure
      
      * drop cpmanttokenizerfast in model_doc
      
      * fix some problems
      
      * fix CPMAnt tokenization for common test
      
      * make style and fixup
      
      * fix copies and fixup
      
      * fix bugs in tokenization test
      
      * dummy commit for connection failure in unittest
      
      * fix copies
      
      * drop trailing comma
      
      * fix decorator in tests
      
      * dummy commit for connection failure in unittest
      
      ---------
      Co-authored-by: default avatarGong Baitao <gongbaitao11@gmail.com>
      523ca4e0
  11. 10 Apr, 2023 2 commits
    • Sugawara's avatar
      add GPTNeoXForSequenceClassification (#22671) · 6daa9cb5
      Sugawara authored
      * add GPTNeoXForSequenceClassification
      
      * move the labels to logits.device (ref: #22561)
      
      * fix
      6daa9cb5
    • Joel Lamy-Poirier's avatar
      Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b
      Joel Lamy-Poirier authored
      
      
      * Add model with cli tool
      
      * Remove unwanted stuff
      
      * Add new code
      
      * Remove inference runner
      
      * Style
      
      * Fix checks
      
      * Test updates
      
      * make fixup
      
      * fix docs
      
      * fix doc
      
      * fix test
      
      * hopefully fix pipeline tests
      
      * refactor
      
      * fix CIs
      
      * add comment
      
      * rename to `GPTBigCodeForCausalLM`
      
      * correct readme
      
      * make fixup + docs
      
      * make fixup
      
      * fixes
      
      * fixes
      
      * Remove pruning
      
      * Remove import
      
      * Doc updates
      
      * More pruning removal
      
      * Combine copies
      
      * Single MQA implementation, remove kv cache pre-allocation and padding
      
      * Update doc
      
      * Revert refactor to match gpt2 style
      
      * Merge back key and value caches, fix some type hints
      
      * Update doc
      
      * Fix position ids pith padding (PR 21080)
      
      * Add conversion script temporarily
      
      * Update conversion script
      
      * Remove checkpoint conversion
      
      * New model
      
      * Fix MQA test
      
      * Fix copies
      
      * try fix tests
      
      * FIX TEST!!
      
      * remove  `DoubleHeadsModel`
      
      * add MQA tests
      
      * add slow tests
      
      * clean up
      
      * add CPU checker
      
      * final fixes
      
      * fixes
      
      - fix GPU issue
      - fixed slow tests
      - skip disk offload
      
      * fix final issue
      
      * Simplify and comment baddbmm fix
      
      * Remove unnecessary code
      
      * Transpose tweaks
      
      * Use beta=1 on cpu, improve tests
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      e0921c6b
  12. 03 Apr, 2023 1 commit
  13. 27 Mar, 2023 1 commit
    • Arthur's avatar
      [WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242
      Arthur authored
      * Initial commit
      
      * update modeling code
      
      * update doc
      
      * add functions necessary
      
      * fix impotrs
      
      * revert changes
      
      * fixup
      
      * more styling to get going
      
      * remove standalone encoder
      
      * update code
      
      * styling
      
      * fix config and model
      
      * update code and some refactoring
      
      * make more tests pass
      
      * Adding NLLB-200 - MoE - 54.5B for no language left behind
      Fixes #21300
      
      * fix mor common tests
      
      * styke
      
      * update testing file
      
      * update
      
      * update
      
      * Router2 doc
      
      * update check config with sparse layer
      
      * add dummy router
      
      * update current conversion script
      
      * create on the fly conversion script
      
      * Fixup
      
      * style
      
      * style 2
      
      * fix empty return
      
      * fix return
      
      * Update default config sparse layers
      
      * easier to create sparse layers
      
      * update
      
      * update conversion script
      
      * update modeling
      
      * add to toctree
      
      * styling
      
      * make ruff happy
      
      * update docstring
      
      * update conversion script
      
      * update, will break tests but impelemting top2
      
      * update
      
      * local groups are supported here
      
      * ️ Support for local groups is now removed ️
      
      This is because it has to work with model parallelism that we do not support
      
      * finish simplificaiton
      
      * Fix forward
      
      * style
      
      * fixup
      
      * Update modelling and test, refactoring
      
      * update tests
      
      * remove final layer)norm as it is done in the FF
      
      * routing works! Logits test added
      
      * nit in test
      
      * remove top1router
      
      * style
      
      * make sure sparse are tested. Had to change route_tokens a liottle bit
      
      * add support for unslip models when converting
      
      * fixup
      
      * style
      
      * update test s
      
      * update test
      
      * REFACTOR
      
      * encoder outputs match!
      
      * style
      
      * update testing
      
      * 🎉encoder and decoder logits match 🎉
      
      
      
      * styleing
      
      * update tests
      
      * cleanup tests
      
      * fix router test and CIs
      
      * cleanup
      
      * cleanup test styling
      
      * fix tests
      
      * Finally the generation tests match!
      
      * cleanup
      
      * update test
      
      * style testing file
      
      * remove script
      
      * cleanup
      
      * more cleanup
      
      * nits
      
      * update
      
      * NLLB tokenizer is wrong and will be fixed soon
      
      * use LongTensors
      
      * update tests
      
      * revert some small changes
      
      * fix second expert sampling and batch prioritized routing
      
      * update tests
      
      * finish last tests
      
      * make ruff happy
      
      * update
      
      * ruff again
      
      * style
      
      * Update docs/source/en/model_doc/nllb-moe.mdx
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Updates based on review
      
      * style and fix import issue
      
      * nit
      
      * more nits
      
      * cleanup
      
      * styling
      
      * update test_seconde_expert_policy
      
      * fix name
      
      * last nit on the markdown examples
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      19ade242
  14. 24 Mar, 2023 1 commit
    • Mitch Naylor's avatar
      Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b
      Mitch Naylor authored
      
      
      * add mega file structure and plain pytorch version of mega source code
      
      * added config class with old naming conventions
      
      * filled in mega documentation
      
      * added config class and embeddings with optional token types
      
      * updated notes
      
      * starting the conversion process, deleted intermediate and added use_cache back to config
      
      * renamed config attributes in modeling_mega.py
      
      * checkpointing before refactoring incremental decoding functions
      
      * removed stateful incremental key/values for EMA and self-attention
      
      * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
      
      * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
      
      * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
      
      * bug fix in attention mask handling in MovingAverageGatedAttention
      
      * removed incremental state from GatedCrossAttention and removed IncrementalState class
      
      * finished gated cross attention and got MegaLayer working
      
      * fixed causal masking in mega decoder
      
      * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
      
      * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
      
      * added optional dense hidden layer for masked and causal LM classes
      
      * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
      
      * removed before_attn_fn in Mega class and updated docstrings and comments up to there
      
      * bug fix in MovingAverageGatedAttention masking
      
      * working conversion of MLM checkpoint in scratchpad script -- perfect matches
      
      * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
      
      * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
      
      * finished checkpoint conversion script
      
      * cleanup old class in mega config script
      
      * removed 'copied from' statements and passing integration tests
      
      * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
      
      * fixed tuple output of megamodel
      
      * all common tests passing after fixing issues in decoder, gradient retention, and initialization
      
      * added mega-specific tests, ready for more documentation and style checks
      
      * updated docstrings; checkpoint before style fixes
      
      * style and quality checks, fixed initialization problem in float_tensor, ready for PR
      
      * added mega to toctree
      
      * removed unnecessary arg in megaconfig
      
      * removed unused arg and fixed code samples with leftover roberta models
      
      * Apply suggestions from code review
      
      Applied all suggestions except the one renaming a class, as I'll need to update that througout
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
      
      * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
      
      * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
      
      * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
      
      * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
      
      * variable names in NFFN
      
      * manual Mega->MEGA changes in docs
      
      * Mega->MEGA in config auto
      
      * style and quality fixes
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
      
      * commit before dealing with merge conflicts
      
      * made new attention activation functions available in ACT2FN and added generation test from OPT
      
      * style and quality in activations and tests
      
      * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
      
      * style and quality fixes after latest updates, before rotary position ids
      
      * causal mask in MegaBlock docstring + added missing device passing
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
      
      * style and quality fixes + readme updates pointing to main
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      57f25f4b
  15. 20 Mar, 2023 1 commit
  16. 17 Mar, 2023 2 commits
  17. 16 Mar, 2023 1 commit
    • Jason Phang's avatar
      LLaMA Implementation (#21955) · 0041be5b
      Jason Phang authored
      
      
      * LLaMA
      
      * sharding and docs
      
      * tweak
      
      * black
      
      * inits
      
      * ruff
      
      * LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * init
      
      * no checkpoint
      
      * docs
      
      * ruff
      
      * type_vocab_size
      
      * tokenizer fixes
      
      * tokenizer fixes
      
      * Update tokenization_llama.py
      
      * Update tokenization_llama.py
      
      * Update configuration_llama.py
      
      * Update modeling_llama.py
      
      * tokenizer add_bos by default
      
      * licenses
      
      * remove decoder
      
      * norms and mlp
      
      * rope overhaul
      
      * tweaks
      
      * black
      
      * mention OPT implementation
      
      * off-by-one naming
      
      * typo
      
      * fix
      
      * tokenization fix and slicing bug
      
      * padding config
      
      * cleanup
      
      * black
      
      * update tests
      
      * undo typo
      
      * fix vocab caching logic
      
      * ruff
      
      * docbuilder
      
      * attn fix from BlackSamorez
      
      * initial feedback
      
      * typo
      
      * docs
      
      * llama case
      
      * llama case
      
      * load checkpoint docs
      
      * comment about tokenizer
      
      * tokenizer defaults
      
      * clear past_key_values if use_cache=False
      
      * last tweaks
      
      * last tweaks
      
      * last tweaks
      
      * last tweaks
      
      ---------
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      0041be5b
  18. 14 Mar, 2023 1 commit
  19. 13 Mar, 2023 1 commit
  20. 07 Mar, 2023 1 commit
  21. 06 Mar, 2023 1 commit
  22. 28 Feb, 2023 1 commit
  23. 27 Feb, 2023 1 commit
  24. 22 Feb, 2023 1 commit
  25. 20 Feb, 2023 2 commits
    • Alara Dirik's avatar
      Add EfficientNet (#21563) · 49ab1623
      Alara Dirik authored
      * Add EfficientNet to transformers
      49ab1623
    • tanreinama's avatar
      add GPTSAN model (reopen) (#21291) · f56174ac
      tanreinama authored
      * add GPTSAN-Japanese
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN (update for review)
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * fix typo in comment text
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * fix document and comments
      
      * fix class name GPTSAN->GPTSan
      
      * fix import and test for tokenizer
      f56174ac
  26. 15 Feb, 2023 1 commit
    • Susnato Dhar's avatar
      Add Ernie-M Model to huggingface (#21349) · 0c9c8472
      Susnato Dhar authored
      * config and tokenization(fast too) changed and ErnieEncoder added
      
      * Slow Tokenization Added
      
      * Tokenizer(slow) is now working and Fast Tokenizer removed
      
      * Added Config code
      
      * Added Base Model and utils
      
      * ErnieMModel is now working
      
      * All added except tests
      
      * All tests passed except ErnieUIEM
      
      * All tests passed
      
      * all fixes done
      
      * all fixes done
      
      * fixed MAP
      
      * fixed check_code_quality
      
      * fixed Build PR Documentation issue
      
      * Added changes(comments) and also updated to the latest upstream/main
      
      * Added fixup
      
      * Added # Copied comments
      
      * Added fixup
      
      * Added more comments and some nits
      
      * Added fixup
      
      * Fixed README_hd.md
      
      * Added more fixes
      
      * ErnieMTokenizer (being sentencepiece) protected and other docs edited
      
      * Added code_quality fix
      
      * Fixed for
      
      * Added more fix
      
      * modified AZ
      
      * ernie-m tokenization test added!
      
      * attention mask part fixed(with 0->self.config.pad_token_id)
      
      * applied make fixup
      0c9c8472
  27. 14 Feb, 2023 1 commit
  28. 13 Feb, 2023 1 commit
  29. 10 Feb, 2023 2 commits
  30. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  31. 02 Feb, 2023 1 commit
  32. 31 Jan, 2023 1 commit
    • NielsRogge's avatar
      Add DETA (#20983) · 5451f889
      NielsRogge authored
      * First draft
      
      * Add initial draft of conversion script
      
      * Convert all weights
      
      * Fix config
      
      * Add image processor
      
      * Fix DetaImageProcessor
      
      * Run make fix copies
      
      * Remove timm dependency
      
      * Fix dummy objects
      
      * Improve loss function
      
      * Remove conv_encoder attribute
      
      * Update conversion scripts
      
      * Improve postprocessing + docs
      
      * Fix copied from statements
      
      * Add tests
      
      * Improve postprocessing
      
      * Improve postprocessing
      
      * Update READMEs
      
      * More improvements
      
      * Fix rebase
      
      * Add is_torchvision_available
      
      * Add torchvision dependency
      
      * Fix typo and README
      
      * Fix bug
      
      * Add copied from
      
      * Fix style
      
      * Apply suggestions
      
      * Fix thanks to @ydshieh
      
      * Fix another dependency check
      
      * Simplify image processor
      
      * Add scipy
      
      * Improve code
      
      * Add threshold argument
      
      * Fix bug
      
      * Set default threshold
      
      * Improve integration test
      
      * Add another integration test
      
      * Update setup.py
      
      * Address review
      
      * Improve deformable attention function
      
      * Improve copied from
      
      * Use relative imports
      
      * Address review
      
      * Replace assertions
      
      * Address review
      
      * Update dummies
      
      * Remove dummies
      
      * Address comments, update READMEs
      
      * Remove custom kernel code
      
      * Add image processor tests
      
      * Add requires_backends
      
      * Add minor comment
      
      * Update scripts
      
      * Update organization name
      
      * Fix defaults, add doc tests
      
      * Add id2label for object 365
      
      * Fix tests
      
      * Update task guide
      5451f889
  33. 27 Jan, 2023 1 commit
    • Maria Khalusova's avatar
      Automated compatible models list for task guides (#21338) · 73a2ff69
      Maria Khalusova authored
      * initial commit. added tip placeholders and a script
      
      * removed unused imports, fixed paths
      
      * fixed generated links
      
      * make style
      
      * split language modeling doc into two: causal language modeling and masked language modeling
      
      * added check_task_guides.py to make fix-copies
      
      * review feedback addressed
      73a2ff69
  34. 25 Jan, 2023 1 commit
    • Maria Khalusova's avatar
      Documentation code sample fixes (#21302) · 23844941
      Maria Khalusova authored
      * Fixed the following:
      pipe -> pipeline
      out in pipe(data()) is a list of dict, not a dict
      
      * Fixed the TypeError: __init__() missing 1 required positional argument: 'key'
      
      * Added a tip: code sample requires additional libraries to run
      
      * Fixed custom config's name
      
      * added seqeval to the required libraries
      
      * fixed a missing dependency,
      fixed metric naming,
      added checkpoint to fix the datacollator
      
      * added checkpoint to fix the datacollator,
      added missing dependency
      23844941
  35. 23 Jan, 2023 1 commit