1. 17 Mar, 2022 3 commits
  2. 16 Mar, 2022 5 commits
  3. 15 Mar, 2022 3 commits
  4. 14 Mar, 2022 4 commits
    • Francesco Saverio Zuppichini's avatar
      [WIP] Resnet (#15770) · e3008c67
      Francesco Saverio Zuppichini authored
      
      
      * first commit
      
      * ResNet model correctly implemented.
      
      basic modeling + weights conversion is done
      
      removed unused doc
      
      mdx file
      
      doc and conversion script
      
      added feature_extractor to auto
      
      test
      
      minor changes + style + quality
      
      doc
      
      test
      
      Delete process.yml
      
      A left over from my attempt of running circleci locally
      
      * minor changes
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * new test format
      
      * minor changes from conversations
      
      * minor changes from conversations
      
      * make style + quality
      
      * readded the tests
      
      * test + README
      
      * minor changes from conversations
      
      * error in README
      
      * make fix-copies
      
      * removed regression for classification head
      
      * make quality
      
      * fixed loss control flow
      
      * fixed loss control flow
      
      * resolved conversations
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * READMEs
      
      * index.mdx
      
      * minor changes
      
      * updated tests and models
      
      * unused import
      
      * outputs
      
      * Update docs/source/model_doc/resnet.mdx
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * added embeddings_size
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * conversation
      
      * added push to hub
      
      * test
      
      * embedding_size
      
      * make fix-copies
      
      * resolved conversations
      
      * CI
      
      * changed organization
      
      * minor changes
      
      * CI
      
      * minor changes
      
      * conversations
      
      * conversation
      
      * doc
      
      * tests
      
      * removed unused docstring
      
      * conversation
      
      * removed unused outputs
      
      * CI
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      e3008c67
    • Yih-Dar's avatar
      Make TF pt-tf equivalence test more aggressive (#15839) · 923c35b5
      Yih-Dar authored
      
      
      * Make TF pt-tf equivalence test more aggressive
      
      * Fix for TFConvNextModelTest and TFTransfoXLModelTest
      
      * fix kwargs for outputs
      
      * clean-up
      
      * Add docstring for check_outputs()
      
      * remove: need to rename encoder-decoder
      
      * clean-up
      
      * send PyTorch things to the correct device
      
      * Add back the accidentally removed test case in test_pt_tf_model_equivalence()
      
      * Fix: change to tuple before calling check_outputs()
      
      * Fix: tfo could be a list
      
      * use to_tuple()
      
      * allow tfo only to be tuple or tensor
      
      * allow tfo to be list or tuple for now + style change
      
      * minor fix
      
      * remove np.copy and update comments
      
      * tfo -> tf_output, same for pt
      
      * Add more detailed comment
      
      * remove the incorrect comment
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      923c35b5
    • Sanchit Gandhi's avatar
      Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained... · 2de99e6c
      Sanchit Gandhi authored
      Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints (#16056)
      
      * Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints
      
      * change wording
      2de99e6c
    • lewtun's avatar
      Add TFCamembertForCausalLM and ONNX integration test (#16073) · 6e1e88fd
      lewtun authored
      * Make Camembert great again!
      
      * Add Camembert to TensorFlow ONNX tests
      6e1e88fd
  5. 12 Mar, 2022 1 commit
    • Stas Bekman's avatar
      [Deepspeed] add support for bf16 mode (#14569) · 580dd87c
      Stas Bekman authored
      
      
      * [WIP] add support for bf16 mode
      
      * prep for bf16
      
      * prep for bf16
      
      * fix; zero2/bf16 is ok
      
      * check bf16 is available
      
      * test fixes
      
      * enable zero3_bf16
      
      * config files
      
      * docs
      
      * split stage_dtype; merge back to non-dtype-specific config file
      
      * fix doc
      
      * cleanup
      
      * cleanup
      
      * bfloat16 => bf16 to match the PR changes
      
      * s/zero_gather_fp16_weights_on_model_save/zero_gather_16bit_weights_on_model_save/; s/save_fp16_model/save_16bit_model/
      
      * test fixes/skipping
      
      * move
      
      * fix
      
      * Update docs/source/main_classes/deepspeed.mdx
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * backticks
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * new version
      
      * add note about grad accum in bf16
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      580dd87c
  6. 11 Mar, 2022 2 commits
    • Kevin Bondzio's avatar
      Add soft length regulation for sequence generation (#15245) · 9442b3ce
      Kevin Bondzio authored
      
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix test config, fix formatting
      
      * fix rag integration, fix docstyling
      
      * fix wrong docstring
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * change test according to new param
      
      * fix formatting
      
      * fix test case
      
      * fix doc style
      
      * move start_length calculation to Logitprocessor
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix rag integration, fix docstyling
      
      * fix test config, fix formatting
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * remove unused import
      
      * fix small errors
      
      * fix test
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix test config, fix formatting
      
      * fix rag integration, fix docstyling
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * change test according to new param
      
      * fix test case
      
      * move start_length calculation to Logitprocessor
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix rag integration, fix docstyling
      
      * fix test config, fix formatting
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix test config, fix formatting
      
      * fix rag integration, fix docstyling
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix rag integration, fix docstyling
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * fix small errors
      
      * Update src/transformers/generation_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/generation_utils.py
      
      * Update src/transformers/generation_utils.py
      
      * fix docstring, add type ind model rag
      
      * fix docstrings
      
      * introduce seq_length variable for cleaner code
      
      * fix black formatting
      
      * add input_ids_seq_length to modeling_rag
      
      * add input_ids_seq_length to test
      
      * retrigger checks
      
      * retrigger checks
      Co-authored-by: default avatarKevin Bondzio <kev@AIM-LAP-02.local>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarKevin Bondzio <kev@AIM-LAP-02.fritz.box>
      9442b3ce
    • Yih-Dar's avatar
      Fix a TF test name (LayoutLMModelTest) (#16061) · b6bdb943
      Yih-Dar authored
      
      
      * fix name
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      b6bdb943
  7. 10 Mar, 2022 6 commits
  8. 09 Mar, 2022 6 commits
  9. 08 Mar, 2022 5 commits
  10. 07 Mar, 2022 3 commits
  11. 04 Mar, 2022 2 commits
    • Francesco Saverio Zuppichini's avatar
      9932ee4b
    • Chan Woo Kim's avatar
      Constrained Beam Search [*With* Disjunctive Decoding] (#15761) · 5c6f57ee
      Chan Woo Kim authored
      
      
      * added classes to get started with constrained beam search
      
      * in progress, think i can directly force tokens now but not yet with the round robin
      
      * think now i have total control, now need to code the bank selection
      
      * technically works as desired, need to optimize and fix design choices leading to undersirable outputs
      
      * complete PR #1 without disjunctive decoding
      
      * removed incorrect tests
      
      * Delete k.txt
      
      * Delete test.py
      
      * Delete test.sh
      
      * revert changes to test scripts
      
      * genutils
      
      * full implementation with testing, no disjunctive yet
      
      * shifted docs
      
      * passing all tests realistically ran locally
      
      * removing accidentally included print statements
      
      * fixed source of error in initial PR test
      
      * fixing the get_device() vs device trap
      
      * fixed documentation docstrings about constrained_beam_search
      
      * fixed tests having failing for Speech2TextModel's floating point inputs
      
      * fix cuda long tensor
      
      * added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search
      
      * deleted accidentally added test halting code with assert False
      
      * code reformat
      
      * Update tests/test_generation_utils.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/test_generation_utils.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/test_generation_utils.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/test_generation_utils.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/test_generation_utils.py
      
      * fixing based on comments on PR
      
      * took out the testing code that should but work fails without the beam search moditification ; style changes
      
      * fixing comments issues
      
      * docstrings for ConstraintListState
      
      * typo in PhrsalConstraint docstring
      
      * docstrings improvements
      
      * finished adding what is sort of an opinionated implementation of disjunctive generation, but it revealed errors in inner beam search logic during testing.
      
      * fixed bug found in constrained beam search that used beam_idx that were not global across all the batches
      
      * disjunctive constraint working 100% correctly
      
      * passing all tests
      
      * Accidentally included mlruns
      
      * Update src/transformers/generation_beam_constraints.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/generation_beam_constraints.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * complete overhaul of type complexities and other nits
      
      * strict type checks in generate()
      
      * fixing second round of feedback by narsil
      
      * fixed failing generation test because of type check overhaul
      
      * generation test fail fix
      
      * fixing test fails
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      5c6f57ee