- 06 Nov, 2020 8 commits
-
-
Yifan Peng authored
-
smanjil authored
* model details * Apply suggestions from code review Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Jiaxin Pei authored
-
Stefan Schweter authored
-
Manuel Romero authored
-
Manuel Romero authored
* Model card: CodeBERT fine-tuned for Insecure Code Detection * Update model_cards/mrm8488/codebert-base-finetuned-detect-insecure-code/README.md Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Manuel Romero authored
-
Stas Bekman authored
* use decorator * remove hardcoded paths * make the test use more data and do real quality tests * shave off 10 secs * add --eval_beams 2, reformat * reduce train size, use smaller custom dataset
-
- 05 Nov, 2020 10 commits
-
-
Leandro von Werra authored
Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Stas Bekman authored
Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Sylvain Gugger authored
* Make Trainer evaluation handle dynamic seq_length * Document behavior. * Fix test * Better fix * Fixes for realsies this time * Address review comments * Without forgetting to save...
-
Guillaume Filion authored
* Output global_attentions in Longformer models * make style * small refactoring * fix tests * make fix-copies * add for tf as well * remove comments in test * make fix-copies * make style * add docs * make docstring pretty Co-authored-by:patrickvonplaten <patrick.v.platen@gmail.com>
-
Sam Shleifer authored
-
Bobby Donchev authored
* change TokenClassificationTask class methods to static methods Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class. Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods. * Trigger Build Co-authored-by:Lysandre <lysandre.debut@reseau.eseo.fr>
-
Guillem Garc铆a Subies authored
-
Patrick von Platen authored
-
Patrick von Platen authored
-
Yifan Peng authored
* Create README.md * Update README.md * Apply suggestions from code review Co-authored-by:
Kevin Canwen Xu <canwenxu@126.com> Co-authored-by:
Julien Chaumond <chaumond@gmail.com>
-
- 04 Nov, 2020 13 commits
-
-
Sylvain Gugger authored
* Clean up data collators and datasets * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Remove needless clone Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Manuel Romero authored
-
Sylvain Gugger authored
* Try -j option * Try other thing * Bigger machine * Test lower sphinx version * Remove trailing space
-
Victor SANH authored
* adding model cards for distil models * forgot the languages
-
Nicolas Patry authored
- The issue is that with previous code we would have the following: ```python qa_pipeline = (...) qa_pipeline(question="Where was he born ?", context="") -> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) ``` The goal here is to improve this to actually return a ValueError wherever possible. While at it, I tried to simplify QuestionArgumentHandler's code to make it smaller and more compat while keeping backward compat.
-
Branden Chan authored
* update deepset/roberta-base-squad2 to v2 * Update model_cards/deepset/roberta-base-squad2/README.md Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Manuel Romero authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Patrick von Platen authored
* fix greedy generate test * delet ipdb
-
Pengzhi Gao authored
-
Stas Bekman authored
Fixing: ``` src/transformers/tokenization_blenderbot.py:163: DeprecationWarning: invalid escape sequence \s token = re.sub("\s{2,}", " ", token) ```
-
- 03 Nov, 2020 9 commits
-
-
Ceyda Cinarel authored
* Bug fix: NER pipeline shouldn't group separate entities of same type * style fix * [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type (B-type1 B-type1) != (B-type1 I-type1) [Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time. The simplest fix is to just group the subwords with the first wordpiece. [TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ? [TODO] handle different wordpiece_prefix ## ? possible approaches: get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property? have an _is_subword(token) [Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping. [Additional Changes] remove B/I prefix on returned grouped_entities [Feature Request/TODO] Return indexes? [Bug TODO] can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string') * use offset_mapping to fix [UNK] token problem * ignore score for subwords * modify ner_pipeline test * modify ner_pipeline test * modify ner_pipeline test * ner_pipeline change ignore_subwords default to true * add ner_pipeline ignore_subword=False test case * fix offset_mapping index * fix style again duh * change is_subword and convert_tokens_to_string logic * merge tests with new test structure * change test names * remove old tests * ner tests for fast tokenizer * fast tokenizers have convert_tokens_to_string * Fix the incorrect merge Co-authored-by:Ceyda Cinarel <snu-ceyda@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Stas Bekman authored
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added * perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts * add `pytest --make-reports` to all CIs (and artifacts) * fix
-
Sylvain Gugger authored
* Add DataCollatorForTokenClassification and clean tests * Make quality
-
Philip May authored
* improve documentation of training_args.py - do_train - do_eval - do_predict * fix line too long * fix style with black on training_args.py * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix line length with utils/style_doc * black reformatting Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Patrick von Platen authored
-
Stas Bekman authored
Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Stas Bekman authored
-
Lysandre authored
-