- 01 Aug, 2020 1 commit
-
-
Faiaz Rahman authored
-
- 31 Jul, 2020 3 commits
-
-
Sylvain Gugger authored
* Harmonize both Trainers API * Fix test * main_prcess -> process_zero
-
Paul O'Leary McCann authored
* Replace mecab-python3 with fugashi This replaces mecab-python3 with fugashi for Japanese tokenization. I am the maintainer of both projects. Both projects are MeCab wrappers, so the underlying C++ code is the same. fugashi is the newer wrapper and doesn't use SWIG, so for basic use of the MeCab API it's easier to use. This code insures the use of a version of ipadic installed via pip, which should make versioning and tracking down issues easier. fugashi has wheels for Windows, OSX, and Linux, which will help with issues with installing old versions of mecab-python3 on Windows. Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't require a C++ runtime to be installed on Windows. In adding this change I removed some code dealing with `cursor`, `token_start`, and `token_end` variables. These variables didn't seem to be used for anything, it is unclear to me why they were there. I ran the tests and they passed, though I couldn't figure out how to run the slow tests (`--runslow` gave an error) and didn't try testing with Tensorflow. * Style fix * Remove unused variable Forgot to delete this... * Adapt doc with install instructions * Fix typo Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Funtowicz Morgan authored
* Add onnxruntime transformers optimization support Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added Optimization section in ONNX/ONNXRuntime documentation. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Improve note reference Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Fixing imports order. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Add warning about different level of optimization between torch and tf export. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Address @LysandreJik wording suggestion Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Address @LysandreJik wording suggestion Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Always optimize model before quantization for maximum performances. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Address comments on the documentation. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Improve TensorFlow optimization message as suggested by @yufenglee Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Removed --optimize parameter Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Warn the user about current quantization limitation when model is larger than 2GB. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Trigger CI for last check * Small change in print for the optimization section. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 30 Jul, 2020 4 commits
-
-
Sylvain Gugger authored
* Start doc tokenizers * Tokenizer documentation * Start doc tokenizers * Tokenizer documentation * Formatting after rebase * Formatting after merge * Update docs/source/main_classes/tokenizer.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Address comment * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> * Address Thom's comments Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com>
-
guillaume-be authored
* initial commit for pipeline implementation Addition of input processing and history concatenation * Conversation pipeline tested and working for single & multiple conversation inputs * Added docstrings for dialogue pipeline * Addition of dialogue pipeline integration tests * Delete test_t5.py * Fixed max code length * Updated styling * Fixed test broken by formatting tools * Removed unused import * Added unit test for DialoguePipeline * Fixed Tensorflow compatibility * Fixed multi-framework support using framework flag * - Fixed docstring - Added `min_length_for_response` as an initialization parameter - Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]` - Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input * - renamed pipeline name from dialogue to conversational - removed hardcoded default value of 1000 and use config.max_length instead - added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation - fixed bug in history truncation method * - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised) * - Simplified input tensor conversion * - Updated attention_mask value for Tensorflow compatibility * - Updated last dialogue reference to conversational & fixed integration tests * Fixed conflict with master * Updates following review comments * Updated formatting * Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs * Update src/transformers/pipelines.py Updated docsting following review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Switch from return_tuple to return_dict * Fix test * [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice * Rework TF trainer (#6038) * Fully rework training/prediction loops * fix method name * Fix variable name * Fix property name * Fix scope * Fix method name * Fix tuple index * Fix tuple index * Fix indentation * Fix variable name * fix eval before log * Add drop remainder for test dataset * Fix step number + fix logging datetime * fix eval loss value * use global step instead of step + fix logging at step 0 * Fix logging datetime * Fix global_step usage * Fix breaking loop + logging datetime * Fix step in prediction loop * Fix step breaking * Fix train/test loops * Force TF at least 2.2 for the trainer * Use assert_cardinality to facilitate the dataset size computation * Log steps per epoch * Make tfds compliant with TPU * Make tfds compliant with TPU * Use TF dataset enumerate instead of the Python one * revert previous commit * Fix data_dir * Apply style * rebase on master * Address Sylvain's comments * Address Sylvain's and Lysandre comments * Trigger CI * Remove unused import * Switch from return_tuple to return_dict * Fix test * Add recent model Co-authored-by:Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Julien Plu <plu.julien@gmail.com>
-
Oren Amsalem authored
a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True) print(a) >tensor([[ 62, 530, 3, 9, 32000]]) a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True) print(a) >tensor([[ 62, 530, 3, 9, 3, 2, 25666, 834, 23, 26, 834, 2915, 3155]])
-
- 29 Jul, 2020 2 commits
-
-
Funtowicz Morgan authored
* Added capability to quantize a model while exporting through ONNX. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> We do not support multiple extensions Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Reformat files Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * More quality Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Ensure test_generate_identified_name compares the same object types Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added documentation everywhere on ONNX exporter Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Use pathlib.Path instead of plain-old string Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Use f-string everywhere Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Use the correct parameters for black formatting Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Use Python 3 super() style. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Use packaging.version to ensure installed onnxruntime version match requirements Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Fixing imports sorting order. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Missing raise(s) Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added quantization documentation Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Fix some spelling. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Fix bad list header format Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Funtowicz Morgan authored
* Move torchscript and add ONNX documentation under modle_export Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Remove previously introduced tree element Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * WIP doc Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * ONNX documentation Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Fix invalid link Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Improve spelling Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Final wording pass Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
- 27 Jul, 2020 1 commit
-
-
Xin Wen authored
Add '-' to make the reference of Transformer-XL more accurate and formal.
-
- 24 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
* Document TF modeling utils * Document all model utils
-
- 22 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 21 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
* Update doc to new model outputs * Fix outputs in quicktour
-
- 20 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 14 Jul, 2020 1 commit
-
-
Joe Davison authored
-
- 13 Jul, 2020 2 commits
-
-
Stas Bekman authored
* implement FlaubertForTokenClassification as a subclass of XLMForTokenClassification * fix mapping order * add the doc * add common tests
-
Stas Bekman authored
-
- 10 Jul, 2020 2 commits
-
-
Sylvain Gugger authored
* Document model outputs * Update docs/source/main_classes/output.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Sylvain Gugger authored
* Update PretrainedConfig doc * Formatting * Small fixes * Forgotten args and more cleanup
-
- 09 Jul, 2020 2 commits
-
-
Sylvain Gugger authored
-
Lysandre Debut authored
-
- 08 Jul, 2020 1 commit
-
-
Stas Bekman authored
-
- 07 Jul, 2020 4 commits
-
-
Joe Davison authored
* add first draft ppl guide * upload imgs * expand on strides * ref typo * rm superfluous past var * add tokenization disclaimer
-
Sam Shleifer authored
improve unittests for finetuning, especially w.r.t testing frozen parameters fix freeze_embeds for T5 add streamlit setup.cfg
-
Suraj Patil authored
* fix model_doc links * update model links
-
Quentin Lhoest authored
* beginning of dpr modeling * wip * implement forward * remove biencoder + better init weights * export dpr model to embed model for nlp lib * add new api * remove old code * make style * fix dumb typo * don't load bert weights * docs * docs * style * move the `k` parameter * fix init_weights * add pretrained configs * minor * update config names * style * better config * style * clean code based on PR comments * change Dpr to DPR * fix config * switch encoder config to a dict * style * inheritance -> composition * add messages in assert startements * add dpr reader tokenizer * one tokenizer per model * fix base_model_prefix * fix imports * typo * add convert script * docs * change tokenizers conf names * style * change tokenizers conf names * minor * minor * fix wrong names * minor * remove unused convert functions * rename convert script * use return_tensors in tokenizers * remove n_questions dim * move generate logic to tokenizer * style * add docs * docs * quality * docs * add tests * style * add tokenization tests * DPR full tests * Stay true to the attention mask building * update docs * missing param in bert input docs * docs * style Co-authored-by:Lysandre <lysandre.debut@reseau.eseo.fr>
-
- 06 Jul, 2020 4 commits
-
-
Lysandre authored
-
Lysandre authored
-
Arnav Sharma authored
-
ELanning authored
-
- 02 Jul, 2020 2 commits
-
-
Sylvain Gugger authored
* Work on tokenizer summary * Finish tutorial * Link to it * Apply suggestions from code review Co-authored-by:
Anthony MOI <xn1t0x@gmail.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Add vocab definition Co-authored-by:
Anthony MOI <xn1t0x@gmail.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
George Ho authored
-
- 01 Jul, 2020 4 commits
-
-
Patrick von Platen authored
* fix conflicts * fix * happy rebasing
-
Patrick von Platen authored
-
Sylvain Gugger authored
* Trigger CI * Fix dropdown bug in searches
-
Sylvain Gugger authored
-
- 30 Jun, 2020 2 commits
-
-
Sylvain Gugger authored
* Documentation for the Trainer API * Address review comments * Address comments
-
Sylvain Gugger authored
* How to share model cards * Switch the two options * Fix bad copy/cut * Julien's suggestion
-
- 29 Jun, 2020 1 commit
-
-
Lysandre Debut authored
* Doc for v3.0.0 * Update docs/source/_static/js/custom.js Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/_static/js/custom.js Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-