"docs/vscode:/vscode.git/clone" did not exist on "b90e29d52cfe94b1995cc5254f700e776b866d2d"
- 10 Nov, 2020 4 commits
-
-
Julien Chaumond authored
-
Lysandre Debut authored
* Patch token classification pipeline * Some added tests for TokenClassificationArgumentHandler (#8366) Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
Julien Chaumond authored
* fix typo * rm use_cdn & references, and implement new hf_bucket_url * I'm pretty sure we don't need to `read` this file * same here * [BIG] file_utils.networking: do not gobble up errors anymore * Fix CI
馃槆 * Apply suggestions from code review Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Tiny doc tweak * Add doc + pass kwarg everywhere * Add more tests and explain cc @sshleifer let me know if better Co-Authored-By:
Sam Shleifer <sshleifer@gmail.com> * Also implement revision in pipelines In the case where we're passing a task name or a string model identifier * Fix CI
馃槆 * Fix CI * [hf_api] new methods + command line implem * make style * Final endpoints post-migration * Fix post-migration * Py3.6 compat cc @stefan-it Thank you @stas00 Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
Teven authored
* Move XLNet memory length FutureWarning * isort * style * Changed default XLNet memory length
-
- 09 Nov, 2020 10 commits
-
-
Stas Bekman authored
* add a multi-gpu job for all example tests * run only ported tests * rename * explain why env is re-activated on each step * mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me * style * Apply suggestions from code review Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
Sylvain Gugger authored
-
Sam Shleifer authored
-
Patrick von Platen authored
* add training tests * correct longformer * fix docs * fix some tests * fix some more train tests * remove ipdb * fix multiple edge case model training * fix funnel and prophetnet * clean gpt models * undo renaming of albert
-
Sylvain Gugger authored
-
Stas Bekman authored
* fairseq broke chkpt data - fixing that * style * support older bpecodes filenames - specifically "code" in iwslt14
-
Stas Bekman authored
* support lowercase tokenizer * fix arg pos
-
Shashank Gupta authored
-
Philip May authored
* add evaluate doc * fix style with utils/style.doc * Update src/transformers/trainer.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
-
- 08 Nov, 2020 2 commits
-
-
Jonathan Chang authored
-
Manav Rathod authored
-
- 07 Nov, 2020 1 commit
-
-
Jonathan Chang authored
* Fix DataCollatorForWholeWordMask * Replace all tensorize_batch in data_collator.py
-
- 06 Nov, 2020 2 commits
-
-
Patrick von Platen authored
-
Yossi Synett authored
[All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (#8071) * Output cross-attention with decoder attention output * Update src/transformers/modeling_bert.py * add cross-attention for t5 and bart as well * fix tests * correct typo in docs * add sylvains and sams comments * correct typo Co-authored-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
- 05 Nov, 2020 3 commits
-
-
Stas Bekman authored
Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Sylvain Gugger authored
* Make Trainer evaluation handle dynamic seq_length * Document behavior. * Fix test * Better fix * Fixes for realsies this time * Address review comments * Without forgetting to save...
-
Guillaume Filion authored
* Output global_attentions in Longformer models * make style * small refactoring * fix tests * make fix-copies * add for tf as well * remove comments in test * make fix-copies * make style * add docs * make docstring pretty Co-authored-by:patrickvonplaten <patrick.v.platen@gmail.com>
-
- 04 Nov, 2020 3 commits
-
-
Sylvain Gugger authored
* Clean up data collators and datasets * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Remove needless clone Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Nicolas Patry authored
- The issue is that with previous code we would have the following: ```python qa_pipeline = (...) qa_pipeline(question="Where was he born ?", context="") -> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) ``` The goal here is to improve this to actually return a ValueError wherever possible. While at it, I tried to simplify QuestionArgumentHandler's code to make it smaller and more compat while keeping backward compat.
-
Stas Bekman authored
Fixing: ``` src/transformers/tokenization_blenderbot.py:163: DeprecationWarning: invalid escape sequence \s token = re.sub("\s{2,}", " ", token) ```
-
- 03 Nov, 2020 8 commits
-
-
Ceyda Cinarel authored
* Bug fix: NER pipeline shouldn't group separate entities of same type * style fix * [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type (B-type1 B-type1) != (B-type1 I-type1) [Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time. The simplest fix is to just group the subwords with the first wordpiece. [TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ? [TODO] handle different wordpiece_prefix ## ? possible approaches: get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property? have an _is_subword(token) [Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping. [Additional Changes] remove B/I prefix on returned grouped_entities [Feature Request/TODO] Return indexes? [Bug TODO] can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string') * use offset_mapping to fix [UNK] token problem * ignore score for subwords * modify ner_pipeline test * modify ner_pipeline test * modify ner_pipeline test * ner_pipeline change ignore_subwords default to true * add ner_pipeline ignore_subword=False test case * fix offset_mapping index * fix style again duh * change is_subword and convert_tokens_to_string logic * merge tests with new test structure * change test names * remove old tests * ner tests for fast tokenizer * fast tokenizers have convert_tokens_to_string * Fix the incorrect merge Co-authored-by:Ceyda Cinarel <snu-ceyda@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Stas Bekman authored
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added * perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts * add `pytest --make-reports` to all CIs (and artifacts) * fix
-
Sylvain Gugger authored
* Add DataCollatorForTokenClassification and clean tests * Make quality
-
Philip May authored
* improve documentation of training_args.py - do_train - do_eval - do_predict * fix line too long * fix style with black on training_args.py * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix line length with utils/style_doc * black reformatting Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
-
guillaume-be authored
* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot) * Addition of integration test for EncoderDecoder conversation model Co-authored-by:Lysandre Debut <lysandre@huggingface.co>
-
Nicolas Patry authored
* [FIX] TextGenerationPipeline is currently broken. It's most likely due to #8180. What's missing is a multi vs single string handler at the beginning of the pipe. And also there was no testing of this pipeline. * Fixing Conversational tests too.
-
Patrick von Platen authored
* first draft * show design proposition for new generate method * up * make better readable * make first version * gpt2 tests pass * make beam search for gpt2 work * add first encoder-decoder code * delete typo * make t5 work * save indermediate * make bart work with beam search * finish beam search bart / t5 * add default kwargs * make more tests pass * fix no bad words sampler * some fixes and tests for all distribution processors * fix test * fix rag slow tests * merge to master * add nograd to generate * make all slow tests pass * speed up generate * fix edge case bug * small fix * correct typo * add type hints and docstrings * fix typos in tests * add beam search tests * add tests for beam scorer * fix test rag * finish beam search tests * move generation tests in seperate file * fix generation tests * more tests * add aggressive generation tests * fix tests * add gpt2 sample test * add more docstring * add more docs * finish doc strings * apply some more of sylvains and sams comments * fix some typos * make fix copies * apply lysandres and sylvains comments * final corrections on examples * small fix for reformer
-
- 02 Nov, 2020 7 commits
-
-
Stas Bekman authored
-
Patrick von Platen authored
-
Lysandre Debut authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Santiago Castro authored
-
Nicolas Patry authored
* Some work to fix the behaviour of DefaultArgumentHandler by removing it. * Fixing specific pipelines argument checking.
-