1. 18 Jul, 2020 1 commit
    • Teven's avatar
      Xlnet outputs (#5881) · 13be4872
      Teven authored
      Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
      13be4872
  2. 17 Jul, 2020 2 commits
    • Teven's avatar
      Revert "XLNet `use_cache` refactor (#5770)" (#5854) · 615be03f
      Teven authored
      This reverts commit 0b2da0e5.
      615be03f
    • Teven's avatar
      XLNet `use_cache` refactor (#5770) · 0b2da0e5
      Teven authored
      Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
      0b2da0e5
  3. 10 Jul, 2020 2 commits
    • Sylvain Gugger's avatar
      Change model outputs types to self-document outputs (#5438) · edfd82f5
      Sylvain Gugger authored
      * [WIP] Proposal for model outputs
      
      * All Bert models
      
      * Make CI green maybe?
      
      * Fix ONNX test
      
      * Isolate ModelOutput from pt and tf
      
      * Formatting
      
      * Add Electra models
      
      * Auto-generate docstrings from outputs
      
      * Add TF outputs
      
      * Add some BERT models
      
      * Revert TF side
      
      * Remove last traces of TF changes
      
      * Fail with a clear error message
      
      * Add Albert and work through Bart
      
      * Add CTRL and DistilBert
      
      * Formatting
      
      * Progress on Bart
      
      * Renames and finish Bart
      
      * Formatting
      
      * Fix last test
      
      * Add DPR
      
      * Finish Electra and add FlauBERT
      
      * Add GPT2
      
      * Add Longformer
      
      * Add MMBT
      
      * Add MobileBert
      
      * Add GPT
      
      * Formatting
      
      * Add Reformer
      
      * Add Roberta
      
      * Add T5
      
      * Add Transformer XL
      
      * Fix test
      
      * Add XLM + fix XLMForTokenClassification
      
      * Style + XLMRoberta
      
      * Add XLNet
      
      * Formatting
      
      * Add doc of return_tuple arg
      edfd82f5
    • Sylvain Gugger's avatar
      Improvements to PretrainedConfig documentation (#5642) · b2747af5
      Sylvain Gugger authored
      * Update PretrainedConfig doc
      
      * Formatting
      
      * Small fixes
      
      * Forgotten args and more cleanup
      b2747af5
  4. 28 Jun, 2020 1 commit
  5. 10 Jun, 2020 1 commit
  6. 09 Jun, 2020 1 commit
    • Bharat Raghunathan's avatar
      [All models] Extend config.output_attentions with output_attentions function arguments (#4538) · 6e603cb7
      Bharat Raghunathan authored
      
      
      * DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
      
      * DOC: Apply Black Formatting
      
      * Fix errors where output_attentions was undefined
      
      * Remove output_attentions in classes per review
      
      * Fix regressions on tests having `output_attention`
      
      * Fix further regressions in tests relating to `output_attentions`
      
      Ensure proper propagation of `output_attentions` as a function parameter
      to all model subclasses
      
      * Fix more regressions in `test_output_attentions`
      
      * Fix issues with BertEncoder
      
      * Rename related variables to `output_attentions`
      
      * fix pytorch tests
      
      * fix bert and gpt2 tf
      
      * Fix most TF tests for `test_output_attentions`
      
      * Fix linter errors and more TF tests
      
      * fix conflicts
      
      * DOC: Apply Black Formatting
      
      * Fix errors where output_attentions was undefined
      
      * Remove output_attentions in classes per review
      
      * Fix regressions on tests having `output_attention`
      
      * fix conflicts
      
      * fix conflicts
      
      * fix conflicts
      
      * fix conflicts
      
      * fix pytorch tests
      
      * fix conflicts
      
      * fix conflicts
      
      * Fix linter errors and more TF tests
      
      * fix tf tests
      
      * make style
      
      * fix isort
      
      * improve output_attentions
      
      * improve tensorflow
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      6e603cb7
  7. 02 Jun, 2020 1 commit
    • Julien Chaumond's avatar
      Kill model archive maps (#4636) · d4c2cb40
      Julien Chaumond authored
      * Kill model archive maps
      
      * Fixup
      
      * Also kill model_archive_map for MaskedBertPreTrainedModel
      
      * Unhook config_archive_map
      
      * Tokenizers: align with model id changes
      
      * make style && make quality
      
      * Fix CI
      d4c2cb40
  8. 01 Jun, 2020 2 commits
  9. 01 May, 2020 1 commit
  10. 29 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      CDN urls (#4030) · 455c6390
      Julien Chaumond authored
      * [file_utils] use_cdn + documentation
      
      * Move to cdn. urls for weights
      
      * [urls] Hotfix for bert-base-japanese
      455c6390
  11. 23 Apr, 2020 1 commit
  12. 22 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      Trainer (#3800) · dd9d483d
      Julien Chaumond authored
      * doc
      
      * [tests] Add sample files for a regression task
      
      * [HUGE] Trainer
      
      * Feedback from @sshleifer
      
      * Feedback from @thomwolf + logging tweak
      
      * [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
      
      * [glue] Use default max_seq_length of 128 like before
      
      * [glue] move DataTrainingArguments around
      
      * [ner] Change interface of InputExample, and align run_{tf,pl}
      
      * Re-align the pl scripts a little bit
      
      * ner
      
      * [ner] Add integration test
      
      * Fix language_modeling with API tweak
      
      * [ci] Tweak loss target
      
      * Don't break console output
      
      * amp.initialize: model must be on right device before
      
      * [multiple-choice] update for Trainer
      
      * Re-align to 827d6d6e
      dd9d483d
  13. 18 Apr, 2020 1 commit
  14. 14 Apr, 2020 1 commit
  15. 10 Apr, 2020 1 commit
    • Jin Young Sohn's avatar
      Add `run_glue_tpu.py` that trains models on TPUs (#3702) · 551b4505
      Jin Young Sohn authored
      * Initial commit to get BERT + run_glue.py on TPU
      
      * Add README section for TPU and address comments.
      
      * Cleanup TPU bits from run_glue.py (#3)
      
      TPU runner is currently implemented in:
      https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.
      
      We plan to upstream this directly into `huggingface/transformers`
      (either `master` or `tpu`) branch once it's been more thoroughly tested.
      
      * Cleanup TPU bits from run_glue.py
      
      TPU runner is currently implemented in:
      https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py
      
      .
      
      We plan to upstream this directly into `huggingface/transformers`
      (either `master` or `tpu`) branch once it's been more thoroughly tested.
      
      * No need to call `xm.mark_step()` explicitly (#4)
      
      Since for gradient accumulation we're accumulating on batches from
      `ParallelLoader` instance which on next() marks the step itself.
      
      * Resolve R/W conflicts from multiprocessing (#5)
      
      * Add XLNet in list of models for `run_glue_tpu.py` (#6)
      
      * Add RoBERTa to list of models in TPU GLUE (#7)
      
      * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)
      
      * Use barriers to reduce duplicate work/resources (#9)
      
      * Shard eval dataset and aggregate eval metrics (#10)
      
      * Shard eval dataset and aggregate eval metrics
      
      Also, instead of calling `eval_loss.item()` every time do summation with
      tensors on device.
      
      * Change defaultdict to float
      
      * Reduce the pred, label tensors instead of metrics
      
      As brought up during review some metrics like f1 cannot be aggregated
      via averaging. GLUE task metrics depends largely on the dataset, so
      instead we sync the prediction and label tensors so that the metrics can
      be computed accurately on those instead.
      
      * Only use tb_writer from master (#11)
      
      * Apply huggingface black code formatting
      
      * Style
      
      * Remove `--do_lower_case` as example uses cased
      
      * Add option to specify tensorboard logdir
      
      This is needed for our testing framework which checks regressions
      against key metrics writtern by the summary writer.
      
      * Using configuration for `xla_device`
      
      * Prefix TPU specific comments.
      
      * num_cores clarification and namespace eval metrics
      
      * Cache features file under `args.cache_dir`
      
      Instead of under `args.data_dir`. This is needed as our test infra uses
      data_dir with a read-only filesystem.
      
      * Rename `run_glue_tpu` to `run_tpu_glue`
      Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
      551b4505
  16. 31 Mar, 2020 1 commit
  17. 26 Mar, 2020 1 commit
    • Patrick von Platen's avatar
      Add t5 to pipeline(task='summarization') (#3413) · 9c683ef0
      Patrick von Platen authored
      * solve conflicts
      
      * move warnings below
      
      * incorporate changes
      
      * add pad_to_max_length to pipelines
      
      * add bug fix for T5 beam search
      
      * add prefix patterns
      
      * make style
      
      * fix conflicts
      
      * adapt pipelines for task specific parameters
      
      * improve docstring
      
      * remove unused patterns
      9c683ef0
  18. 25 Mar, 2020 1 commit
  19. 20 Mar, 2020 1 commit
  20. 19 Mar, 2020 1 commit
  21. 16 Mar, 2020 1 commit
  22. 11 Mar, 2020 3 commits
  23. 09 Mar, 2020 2 commits
  24. 05 Mar, 2020 1 commit
  25. 24 Feb, 2020 1 commit
  26. 21 Feb, 2020 1 commit
    • Patrick von Platen's avatar
      Improve special_token_id logic in run_generation.py and add tests (#2885) · fc38d4c8
      Patrick von Platen authored
      
      
      * improving generation
      
      * finalized special token behaviour for no_beam_search generation
      
      * solved modeling_utils merge conflict
      
      * solve merge conflicts in modeling_utils.py
      
      * add run_generation improvements from PR #2749
      
      * adapted language generation to not use hardcoded -1 if no padding token is available
      
      * remove the -1 removal as hard coded -1`s are not necessary anymore
      
      * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown
      
      * add slow language generation tests for pretrained models using hardcoded output with pytorch seed
      
      * delete ipdb
      
      * check that all generated tokens are valid
      
      * renaming
      
      * renaming Generation -> Generate
      
      * make style
      
      * updated so that generate_beam_search has same token behavior than generate_no_beam_search
      
      * consistent return format for run_generation.py
      
      * deleted pretrain lm generate tests -> will be added in another PR
      
      * cleaning of unused if statements and renaming
      
      * run_generate will always return an iterable
      
      * make style
      
      * consistent renaming
      
      * improve naming, make sure generate function always returns the same tensor, add docstring
      
      * add slow tests for all lmhead models
      
      * make style and improve example comments modeling_utils
      
      * better naming and refactoring in modeling_utils
      
      * improving generation
      
      * finalized special token behaviour for no_beam_search generation
      
      * solved modeling_utils merge conflict
      
      * solve merge conflicts in modeling_utils.py
      
      * add run_generation improvements from PR #2749
      
      * adapted language generation to not use hardcoded -1 if no padding token is available
      
      * remove the -1 removal as hard coded -1`s are not necessary anymore
      
      * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown
      
      * add slow language generation tests for pretrained models using hardcoded output with pytorch seed
      
      * delete ipdb
      
      * check that all generated tokens are valid
      
      * renaming
      
      * renaming Generation -> Generate
      
      * make style
      
      * updated so that generate_beam_search has same token behavior than generate_no_beam_search
      
      * consistent return format for run_generation.py
      
      * deleted pretrain lm generate tests -> will be added in another PR
      
      * cleaning of unused if statements and renaming
      
      * run_generate will always return an iterable
      
      * make style
      
      * consistent renaming
      
      * improve naming, make sure generate function always returns the same tensor, add docstring
      
      * add slow tests for all lmhead models
      
      * make style and improve example comments modeling_utils
      
      * better naming and refactoring in modeling_utils
      
      * changed fast random lm generation testing design to more general one
      
      * delete in old testing design in gpt2
      
      * correct old variable name
      
      * temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed
      
      * adapted all fast random generate tests to new design
      
      * better warning description in modeling_utils
      
      * better comment
      
      * better comment and error message
      Co-authored-by: default avatarThomas Wolf <thomwolf@users.noreply.github.com>
      fc38d4c8
  27. 31 Jan, 2020 1 commit
  28. 24 Jan, 2020 2 commits
  29. 16 Jan, 2020 1 commit
  30. 13 Jan, 2020 3 commits
  31. 11 Jan, 2020 1 commit