- 20 Jan, 2021 1 commit
-
-
LSinev authored
-
- 19 Jan, 2021 10 commits
-
-
Sylvain Gugger authored
* Fix model templates and use less than 119 chars * Missing new line
-
Daniel Stancl authored
* Add decoder_head_mask for PyTorch T5 model * Add decoder_head_mask args into T5Model and T5ForConditionalGeneration * Slightly change the order of input args to be in accordance with the convention from BART-based models introduced within the PR #9569. * Make style for modeling_t5.py * Add decoder_head_mask for TF T5 models * Separate head_mask and decoder_head_mask args in TF T5 models * Slightly change the order of input args to follow convention of BART-based models updated in PR #9569 * Update test_forward_signature tests/test_modeling_tf_common.py w.r.t. the changed order of input args * Add FutureWarnings for T5 and TFT5 models * Add FutureWarnings for T5 and TFT5 models warning a user that input argument `head_mask` was split into two arguments - `head_mask` and `decoder_head_mask` * Add default behaviour - `decoder_head_mask` is set to copy `head_mask` * Fix T5 modeling and FutureWarning * Make proper usage of head_mask and decoder_head_mask in cross_attention * Fix conditions for raising FutureWarning * Reformat FutureWarning in T5 modeling * Refactor the warning message
-
Sylvain Gugger authored
* New run_seq2seq script * Add tests * Mark as slow * Update examples/seq2seq/run_seq2seq.py Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/data/data_collator.py Co-authored-by:
Suraj Patil <surajp815@gmail.com> * Update src/transformers/data/data_collator.py Co-authored-by:
Suraj Patil <surajp815@gmail.com> * Address review comments Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Suraj Patil <surajp815@gmail.com>
-
Julien Plu authored
* Fix Flaubert and XLM * Fix Flaubert and XLM * Apply style
-
max yue authored
File "/share/apps/anaconda3/envs/my_env/lib/python3.7/site-packages/transformers/integrations.py", line 419, in __init__ self._SummaryWriter = SummaryWriter UnboundLocalError: local variable 'SummaryWriter' referenced before assignment -
Yusuke Mori authored
* Update past_key_values in gpt2 (#9391) * Update generation_utils, and rename some items * Update modeling_gpt2 to avoid an error in gradient_checkpointing * Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2 * Change the location of '_reorder_cache' in modeling files * Add '_reorder_cache' in modeling_ctrl * Fix a bug of my last commit in CTRL * Add '_reorder_cache' to GPT2DoubleHeadsModel * Manage 'use_cache' in config of test_modeling_gpt2 * Clean up the doc string * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix the doc string (GPT-2, CTRL) * improve gradient_checkpointing_behavior Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Patrick von Platen authored
-
Sergey Mkrtchyan authored
* Fix the attention_mask in DPRReaderTokenizer * Add an integration test for DPRReader inference * Run make style
-
- 18 Jan, 2021 2 commits
-
-
Daniel Stancl authored
* Add head_mask/decoder_head_mask for BART This branch implement head_mask and decoder_head_mask for BART-based models. Full list below: - BART - MBart - Blenderbot - BlenderbotSmall - Marian - Pegasus Everything is accompanied with updated testing. * Fix test_headmasking for BART models * Fix text_headmasking for BART-like models which has only 2 layers in each modules. The condition ``` self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0) ``` is, therefore, invalid for encoder-decoder models considering the `head_mask` ``` head_mask = torch.ones( self.model_tester.num_hidden_layers, self.model_tester.num_attention_heads, device=torch_device, ) head_mask[0, 0] = 0 head_mask[-1, :-1] = 0 ``` specified in the `test_headmasking` test/function. * Adjust test_modeling_common.py to reflect T5 input args * Update tests/test_modeling_common.py Co-authored-by:Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make style * make fix-copies Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Devrim authored
-
- 15 Jan, 2021 5 commits
-
-
Stas Bekman authored
-
Lysandre Debut authored
* Ignore lm_head decoder bias warning * Revert "Ignore lm_head decoder bias warning" This reverts commit f25177a9da6ca898e351f46c8b1515971de5c670. * predictions -> lm_head
-
Julien Plu authored
* Add warning * Remove unused import * Fix missing call * Fix missing call * Completely remove token_type_ids * Apply style * Remove unused import * Update src/transformers/models/mpnet/modeling_tf_mpnet.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Patrick von Platen authored
* fix tf led * remove loop file
-
Kiyoung Kim authored
This reverts commit 3f40070c.
-
- 14 Jan, 2021 6 commits
-
-
Sylvain Gugger authored
* Upstream (and rename) sortish sampler * Use proper sampler * Update src/transformers/trainer_pt_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Kiyoung Kim authored
* gradient accumulation for tftrainer * label naming Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * label naming Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Lysandre Debut authored
-
Julien Plu authored
-
Julien Plu authored
* Compliancy with tf-nightly * Add more version + restore min version check
-
Sylvain Gugger authored
* Fix Trainer with a parallel model * More clean up
-
- 13 Jan, 2021 7 commits
-
-
Lysandre authored
-
Lysandre authored
-
Sylvain Gugger authored
* Fix data parallelism in Trainer * Update src/transformers/training_args.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
LSinev authored
* make TopKLogitsWarper faster * make TopPLogitsWarper faster
-
Lysandre Debut authored
-
Suraj Patil authored
* add model_input_names * fix test
-
Stas Bekman authored
* deepspeed integration * style * add test * ds wants to do its own backward * fp16 assert * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * for clarity extract what args are being passed to deepspeed * introduce the concept of self.wrapped_model * s/self.wrapped_model/self.model_wrapped/ * complete transition to self.wrapped_model / self.model * fix * doc * give ds its own init * add custom overrides, handle bs correctly * fix test * clean up model_init logic, fix small bug * complete fix * collapse --deepspeed_config into --deepspeed * style * start adding doc notes * style * implement hf2ds optimizer and scheduler configuration remapping * oops * call get_num_training_steps absolutely when needed * workaround broken auto-formatter * deepspeed_config arg is no longer needed - fixed in deepspeed master * use hf's fp16 args in config * clean * start on the docs * rebase cleanup * finish up --fp16 * clarify the supported stages * big refactor thanks to discovering deepspeed.init_distributed * cleanup * revert fp16 part * add checkpoint-support * more init ds into integrations * extend docs * cleanup * unfix docs * clean up old code * imports * move docs * fix logic * make it clear which file it's referring to * document nodes/gpus * style * wrong format * style * deepspeed handles gradient clipping * easier to read * major doc rewrite * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * docs * switch to AdamW optimizer * style * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * clarify doc Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 12 Jan, 2021 7 commits
-
-
Sylvain Gugger authored
* Add target contextmanager and rework prepare_seq2seq_batch * Fix tests, treat BART and Barthez * Add last tokenizers * Fix test * Set src token before calling the superclass * Remove special behavior for T5 * Remove needless imports * Remove needless asserts
-
Lysandre Debut authored
-
NielsRogge authored
* Add LayoutLMForSequenceClassification and integration tests Improve docs Add LayoutLM notebook to list of community notebooks * Make style & quality * Address comments by @sgugger, @patrickvonplaten and @LysandreJik * Fix rebase with master * Reformat in one line * Improve code examples as requested by @patrickvonplaten Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Suraj Patil authored
* fix t5 fp16
-
Patrick von Platen authored
-
Patrick von Platen authored
* fix naming issues * better names
-
Patrick von Platen authored
* make templates ready * make add_new_model_command_ready * finish tf bart * prepare tf mbart * finish tf bart * add tf mbart * add marian * prep pegasus * add tf pegasus * push blenderbot tf * add blenderbot * add blenderbot small * clean-up * make fix copy * define blend bot tok * fix * up * make style * add to docs * add copy statements * overwrite changes * improve * fix docs * finish * fix last slow test * fix missing git conflict line * fix blenderbot * up * fix blenderbot small * load changes * finish copied from * upload fix
-
- 11 Jan, 2021 2 commits
-
-
Stas Bekman authored
* round numbers * style * round only on logging
-
Julien Plu authored
-