- 16 Nov, 2020 1 commit
-
-
Sylvain Gugger authored
* Use the CI to identify failing tests * Remove from all examples and tests * More default switch * Fixes * More test fixes * More fixes * Last fixes hopefully * Use the CI to identify failing tests * Remove from all examples and tests * More default switch * Fixes * More test fixes * More fixes * Last fixes hopefully * Run on the real suite * Fix slow tests
-
- 13 Nov, 2020 1 commit
-
-
Lysandre Debut authored
* Model templates * TensorFlow * Remove pooler * CI * Tokenizer + Refactoring * Encoder-Decoder * Let's go testing * Encoder-Decoder in TF * Let's go testing in TF * Documentation * README * Fixes * Better names * Style * Update docs * Choose to skip either TF or PT * Code quality fixes * Add to testing suite * Update file path * Cookiecutter path * Update `transformers` path * Handle rebasing * Remove seq2seq from model templates * Remove s2s config * Apply Sylvain and Patrick comments * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Last fixes from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 04 Nov, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 30 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
* Finish the cleanup of the language-modeling examples * Update main README * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> * Propagate changes Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com>
-
- 29 Oct, 2020 2 commits
-
-
Sylvain Gugger authored
* Add a template for example scripts and apply it to mlm * Formatting * Fix test * Add plm script * Add a template for example scripts and apply it to mlm * Formatting * Fix test * Add plm script * Add a template for example scripts and apply it to mlm * Formatting * Fix test * Add plm script * Styling
-
Santiago Castro authored
* Fix doc errors and typos across the board * Fix a typo * Fix the CI * Fix more typos * Fix CI * More fixes * Fix CI * More fixes * More fixes
-
- 28 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 20 Oct, 2020 1 commit
-
-
Stas Bekman authored
* rename skip targets + docs * fix quotes * style * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * small improvements * fix Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 18 Oct, 2020 1 commit
-
-
Thomas Wolf authored
* splitting fast and slow tokenizers [WIP] * [WIP] splitting sentencepiece and tokenizers dependencies * update dummy objects * add name_or_path to models and tokenizers * prefix added to file names * prefix * styling + quality * spliting all the tokenizer files - sorting sentencepiece based ones * update tokenizer version up to 0.9.0 * remove hard dependency on sentencepiece
🎉 * and removed hard dependency on tokenizers🎉 * update conversion script * update missing models * fixing tests * move test_tokenization_fast to main tokenization tests - fix bugs * bump up tokenizers * fix bert_generation * update ad fix several tokenizers * keep sentencepiece in deps for now * fix funnel and deberta tests * fix fsmt * fix marian tests * fix layoutlm * fix squeezebert and gpt2 * fix T5 tokenization * fix xlnet tests * style * fix mbart * bump up tokenizers to 0.9.2 * fix model tests * fix tf models * fix seq2seq examples * fix tests without sentencepiece * fix slow => fast conversion without sentencepiece * update auto and bert generation tests * fix mbart tests * fix auto and common test without tokenizers * fix tests without tokenizers * clean up tests lighten up when tokenizers + sentencepiece are both off * style quality and tests fixing * add sentencepiece to doc/examples reqs * leave sentencepiece on for now * style quality split hebert and fix pegasus * WIP Herbert fast * add sample_text_no_unicode and fix hebert tokenization * skip FSMT example test for now * fix style * fix fsmt in example tests * update following Lysandre and Sylvain's comments * Update src/transformers/testing_utils.py Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/testing_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 12 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 05 Oct, 2020 2 commits
-
-
Sylvain Gugger authored
* Check and update model list in index.rst automatically * Check and update model list in index.rst automatically * Adapt template
-
Forrest Iandola authored
* configuration_squeezebert.py thin wrapper around bert tokenizer fix typos wip sb model code wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working set up squeezebert to use BertModelOutput when returning results. squeezebert documentation formatting allow head mask that is an array of [None, ..., None] docs docs cont'd path to vocab docs and pointers to cloud files (WIP) line length and indentation squeezebert model cards formatting of model cards untrack modeling_squeezebert_scratchpad.py update aws paths to vocab and config files get rid of stub of NSP code, and advise users to pretrain with mlm only fix rebase issues redo rebase of modeling_auto.py fix issues with code formatting more code format auto-fixes move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert tests for squeezebert modeling and tokenization fix typo move squeezebert before bert in modeling_auto.py to fix inheritance problem disable test_head_masking, since squeezebert doesn't yet implement head masking fix issues exposed by the test_modeling_squeezebert.py fix an issue exposed by test_tokenization_squeezebert.py fix issue exposed by test_modeling_squeezebert.py auto generated code style improvement issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head() update copyright resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask docs add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli autogenerated formatting tweaks integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings * tiny change to order of imports
-
- 24 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
* Clean RAG docs and template docs * Fix typo * Better doc
-
- 15 Sep, 2020 1 commit
-
-
Stas Bekman authored
-
- 10 Sep, 2020 2 commits
-
-
Sylvain Gugger authored
-
Lysandre Debut authored
-
- 04 Sep, 2020 1 commit
-
-
Stas Bekman authored
* remove the implied defaults to :obj:`None` * fix bug in the original * replace to :obj:`True`, :obj:`False`
-
- 03 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 02 Sep, 2020 1 commit
-
-
Stas Bekman authored
* [doc] typos fixed typos * Update README.md
-
- 26 Aug, 2020 1 commit
-
-
Lysandre authored
-
- 24 Aug, 2020 1 commit
-
-
Sylvain Gugger authored
* Run new isort * More changes * Update CI, CONTRIBUTING and benchmarks
-
- 13 Aug, 2020 1 commit
-
-
Stas Bekman authored
* cleanup torch unittests: part 2 * remove trailing comma added by isort, and which breaks flake * one more comma * revert odd balls * part 3: odd cases * more ["key"] -> .key refactoring * .numpy() is not needed * more unncessary .numpy() removed * more simplification
-
- 05 Aug, 2020 1 commit
-
-
Sylvain Gugger authored
* TF outputs and test on BERT * Albert to DistilBert * All remaining TF models except T5 * Documentation * One file forgotten * TF outputs and test on BERT * Albert to DistilBert * All remaining TF models except T5 * Documentation * One file forgotten * Add new models and fix issues * Quality improvements * Add T5 * A bit of cleanup * Fix for slow tests * Style
-
- 04 Aug, 2020 1 commit
-
-
Stas Bekman authored
* improve unit tests this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973 before I apply it to the rest * batch 1 * batch 2 * batch 3 * batch 4 * batch 5 * style * non-tf template * last deletion of check_loss_output
-
- 03 Aug, 2020 1 commit
-
-
Teven authored
* Fixed empty asserts * black-reformatted stragglers in templates * More code quality checks * Update src/transformers/convert_marian_to_pytorch.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/convert_marian_to_pytorch.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * removed unused line as per @sshleifer Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 31 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
* Use return_dict=True in all tests * Formatting
-
- 30 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
* Switch from return_tuple to return_dict * Fix test * [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice * Rework TF trainer (#6038) * Fully rework training/prediction loops * fix method name * Fix variable name * Fix property name * Fix scope * Fix method name * Fix tuple index * Fix tuple index * Fix indentation * Fix variable name * fix eval before log * Add drop remainder for test dataset * Fix step number + fix logging datetime * fix eval loss value * use global step instead of step + fix logging at step 0 * Fix logging datetime * Fix global_step usage * Fix breaking loop + logging datetime * Fix step in prediction loop * Fix step breaking * Fix train/test loops * Force TF at least 2.2 for the trainer * Use assert_cardinality to facilitate the dataset size computation * Log steps per epoch * Make tfds compliant with TPU * Make tfds compliant with TPU * Use TF dataset enumerate instead of the Python one * revert previous commit * Fix data_dir * Apply style * rebase on master * Address Sylvain's comments * Address Sylvain's and Lysandre comments * Trigger CI * Remove unused import * Switch from return_tuple to return_dict * Fix test * Add recent model Co-authored-by:Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Julien Plu <plu.julien@gmail.com>
-
- 24 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 22 Jul, 2020 1 commit
-
-
Sam Shleifer authored
Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
- 28 Jun, 2020 1 commit
-
-
Sam Shleifer authored
* all save_pretrained methods mkdir if not os.path.exists
-
- 26 Jun, 2020 1 commit
-
-
Thomas Wolf authored
* remove references to old API in docstring - update data processors * style * fix tests - better type checking error messages * better type checking * include awesome fix by @LysandreJik for #5310 * updated doc and examples
-
- 15 Jun, 2020 1 commit
-
-
Anthony MOI authored
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) * Use tokenizers pre-tokenized pipeline * failing pretrokenized test * Fix is_pretokenized in python * add pretokenized tests * style and quality * better tests for batched pretokenized inputs * tokenizers clean up - new padding_strategy - split the files * [HUGE] refactoring tokenizers - padding - truncation - tests * style and quality * bump up requied tokenizers version to 0.8.0-rc1 * switched padding/truncation API - simpler better backward compat * updating tests for custom tokenizers * style and quality - tests on pad * fix QA pipeline * fix backward compatibility for max_length only * style and quality * Various cleans up - add verbose * fix tests * update docstrings * Fix tests * Docs reformatted * __call__ method documented Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
- 09 Jun, 2020 1 commit
-
-
Bharat Raghunathan authored
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions`` * DOC: Apply Black Formatting * Fix errors where output_attentions was undefined * Remove output_attentions in classes per review * Fix regressions on tests having `output_attention` * Fix further regressions in tests relating to `output_attentions` Ensure proper propagation of `output_attentions` as a function parameter to all model subclasses * Fix more regressions in `test_output_attentions` * Fix issues with BertEncoder * Rename related variables to `output_attentions` * fix pytorch tests * fix bert and gpt2 tf * Fix most TF tests for `test_output_attentions` * Fix linter errors and more TF tests * fix conflicts * DOC: Apply Black Formatting * Fix errors where output_attentions was undefined * Remove output_attentions in classes per review * Fix regressions on tests having `output_attention` * fix conflicts * fix conflicts * fix conflicts * fix conflicts * fix pytorch tests * fix conflicts * fix conflicts * Fix linter errors and more TF tests * fix tf tests * make style * fix isort * improve output_attentions * improve tensorflow Co-authored-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
- 02 Jun, 2020 1 commit
-
-
Julien Chaumond authored
* Kill model archive maps * Fixup * Also kill model_archive_map for MaskedBertPreTrainedModel * Unhook config_archive_map * Tokenizers: align with model id changes * make style && make quality * Fix CI
-
- 29 Apr, 2020 1 commit
-
-
Julien Chaumond authored
* [file_utils] use_cdn + documentation * Move to cdn. urls for weights * [urls] Hotfix for bert-base-japanese
-
- 18 Apr, 2020 1 commit
-
-
Thomas Wolf authored
* First pass on utility classes and python tokenizers * finishing cleanup pass * style and quality * Fix tests * Updating following @mfuntowicz comment * style and quality * Fix Roberta * fix batch_size/seq_length inBatchEncoding * add alignement methods + tests * Fix OpenAI and Transfo-XL tokenizers * adding trim_offsets=True default for GPT2 et RoBERTa * style and quality * fix tests * add_prefix_space in roberta * bump up tokenizers to rc7 * style * unfortunately tensorfow does like these - removing shape/seq_len for now * Update src/transformers/tokenization_utils.py Co-Authored-By:
Stefan Schweter <stefan@schweter.it> * Adding doc and docstrings * making flake8 happy Co-authored-by:
Stefan Schweter <stefan@schweter.it>
-
- 16 Apr, 2020 1 commit
-
-
Sam Shleifer authored
* Delete some copy pasted code
-
- 08 Apr, 2020 1 commit
-
- 04 Apr, 2020 1 commit
-
-
Julien Chaumond authored
-
- 24 Mar, 2020 1 commit
-
-
Julien Chaumond authored
-