Commits · f16540fcbae9324da8d6ba3d022b542a401c4380 · chenpangpang / transformers

22 Apr, 2020 1 commit

Pipeline for Text Generation: GenerationPipeline (#3758) · f16540fc

Lorenzo Ampil authored Apr 22, 2020



* Add GenerationPipeline

* Fix parameter names

* Correct parameter __call__ parameters

* Add model type attribute and correct function calls for prepare_input

* Take out trailing commas from init attributes

* Remove unnecessary tokenization line

* Implement support for multiple text inputs

* Apply generation support for multiple input text prompts

* Take out tensor coersion

* Take out batch index

* Add text prompt to return sequence

* Squeeze token tensore before decoding

* Return only a single list of sequences if only one prompt was used

* Correct results variable name

* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2

* Registedred AutoModelWithLMHead for both pt and t

* Update docstring for GenerationPipeline

* Add kwargs parameter to mode.generate

* Take out kwargs parameter after all

* Add generation pipeline example in pipeline docstring

* Fix max length by squeezing tokens tensor

* Apply ensure_tensor_on_device to pytorch tensor

* Include generation step in torch.no_grad

* Take out input from prepare_xlm_input and set 'en' as default xlm_language

* Apply framework specific encoding during prepare_input

* Format w make style

* Move GenerationPipeline import to follow proper import sorting

* Take out training comma from generation dict

* Apply requested changes

* Change name to TextGenerationPipeline

* Apply TextGenerationPipeline rename to __init___

* Changing alias to

* Set input mapping as input to ensure_tensor_on_device

* Fix assertion placement

* Add test_text_generation

* Add TextGenerationPipeline to PipelineCommonTests

* Take out whitespace

* Format __init__ w black

* Fix __init__ style

* Forman __init___

* Add line to end of __init__

* Correct model tokenizer set for test_text_generation

* Ensure to return list of list, not list of string (to pass test)

* Limit test models to only 3 to limit runtime to address circleCI timeout error

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict

* Fix blank result list

* Add TextGenerationPipeline to pipelines.rst

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH

* Fix incorrectly moved result list

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Add back generation line and make style

* Take out blank whitespace

* Apply new alis, text-generation, to test_pipelines

* Fix text generation alias in test

* Update src/transformers/pipelines.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

f16540fc

18 Apr, 2020 1 commit

Cleanup fast tokenizers integration (#3706) · 827d6d6e

Thomas Wolf authored Apr 18, 2020



* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>

827d6d6e

17 Mar, 2020 1 commit

Add Summarization to Pipelines (#3128) · 38a555a8

Sam Shleifer authored Mar 17, 2020

* passing

* Undo stupid chg

* docs

* undo rename

* delete-cruft

* only import if you have torch

* Dont rely on dict ordering

* Fix dict ordering upstream

* docstring link

* docstring link

* remove trailing comma for 3.5 compat

* new name

* delegate kwarging

* Update kwargs

38a555a8

02 Mar, 2020 1 commit

Pipeline doc (#3055) · d3eb7d23

Lysandre Debut authored Mar 02, 2020

* Pipeline doc initial commit

* pipeline abstraction

* Remove modelcard argument from pipeline

* Task-specific pipelines can be instantiated with no model or tokenizer

* All pipelines doc

d3eb7d23

07 Feb, 2020 1 commit
- [examples] rename run_lm_finetuning to run_language_modeling · 42f08e59
  Julien Chaumond authored Feb 06, 2020
  
  42f08e59
23 Jan, 2020 1 commit
- TF ALBERT + TF Utilities + Fix warnings · 3922a249
  Lysandre authored Jan 15, 2020
  
  3922a249
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
05 Dec, 2019 2 commits
- Patch evaluation for impossible values + cleanup · 9ecd83da
  LysandreJik authored Dec 05, 2019
  
  9ecd83da
- Add few tests on the TF optimization file with some info in the documentation. Complete the README. · 9200a759
  Julien Plu authored Dec 05, 2019
  
  9200a759
04 Dec, 2019 1 commit
- Documentation · 7a035199
  LysandreJik authored Dec 04, 2019
  
  7a035199
27 Nov, 2019 1 commit
- add XnliProcessor to doc · d75d49a5
  VictorSanh authored Nov 05, 2019
  
  d75d49a5
14 Nov, 2019 1 commit
- update the examples, docs and template · 2276bf69
  Rémi Louf authored Nov 14, 2019
  
  2276bf69
26 Sep, 2019 8 commits
- [doc] pytorch_transformers -> transformers · 927904bc
  LysandreJik authored Sep 26, 2019
  
  927904bc
- Various small doc fixes · 8349d757
  LysandreJik authored Sep 25, 2019
  
  8349d757
- Example usage · fb056494
  LysandreJik authored Sep 25, 2019
  
  fb056494
- Updated doc for `InputExample` and `InputFeatures` · 36f592cc
  LysandreJik authored Sep 25, 2019
  
  36f592cc
- Changed processor documentation architecture. Added documentation for GLUE · ad4a393e
  LysandreJik authored Sep 25, 2019
  
  ad4a393e
- GLUE processors · c4ac7a76
  LysandreJik authored Sep 25, 2019
  
  c4ac7a76
- TF models added to documentation · 4acd87ff
  LysandreJik authored Sep 25, 2019
  
  4acd87ff
- [BIG] pytorch-transformers => transformers · 31c23bd5
  thomwolf authored Sep 26, 2019
  
  31c23bd5
04 Aug, 2019 2 commits
- updating docs - adding few tests to tokenizers · 00132b7a
  thomwolf authored Aug 04, 2019
  
  00132b7a
- big doc update [WIP] · 009273db
  thomwolf authored Aug 04, 2019
  
  009273db