1. 30 Nov, 2020 1 commit
    • Ahmed Elnaggar's avatar
      Add T5 Encoder for Feature Extraction (#8717) · 40ecaf0c
      Ahmed Elnaggar authored
      
      
      * Add T5 Encoder class for feature extraction
      
      * fix T5 encoder add_start_docstrings indent
      
      * update init with T5 encoder
      
      * update init with TFT5ModelEncoder
      
      * remove TFT5ModelEncoder
      
      * change T5ModelEncoder order in init
      
      * add T5ModelEncoder to transformers init
      
      * clean T5ModelEncoder
      
      * update init with TFT5ModelEncoder
      
      * add TFModelEncoder for Tensorflow
      
      * update init with TFT5ModelEncoder
      
      * Update src/transformers/models/t5/modeling_t5.py
      
      change output from Seq2SeqModelOutput to BaseModelOutput
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * remove encoder_outputs
      
      1. remove encoder_outputs from the function call.
      2. remove the encoder_outputs If statement.
      3. remove isinstance from return_dict.
      
      * Authorize missing decoder keys
      
      * remove unnecessary input parameters
      
      remove pask_key_values and use_cache
      
      * remove use_cache
      
      remove use_cache from the forward method
      
      * add doctoring for T5 encoder
      
      add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING
      
      * change return_dict to dot access
      
      * add T5_ENCODER_INPUTS_DOCSTRING for TF T5
      
      * change TFT5Encoder output type to BaseModelOutput
      
      * remove unnecessary parameters for TFT5Encoder
      
      * remove unnecessary if statement
      
      * add import BaseModelOutput
      
      * fix BaseModelOutput typo to TFBaseModelOutput
      
      * update T5 doc with T5ModelEncoder
      
      * add T5ModelEncoder to tests
      
      * finish pytorch
      
      * finish docs and mt5
      
      * add mtf to init
      
      * fix init
      
      * remove n_positions
      
      * finish PR
      
      * Update src/transformers/models/mt5/modeling_mt5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/models/t5/modeling_t5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/models/t5/modeling_tf_t5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/models/mt5/modeling_tf_mt5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * make style
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      40ecaf0c
  2. 17 Nov, 2020 1 commit
    • Sylvain Gugger's avatar
      Reorganize repo (#8580) · c89bdfbe
      Sylvain Gugger authored
      * Put models in subfolders
      
      * Styling
      
      * Fix imports in tests
      
      * More fixes in test imports
      
      * Sneaky hidden imports
      
      * Fix imports in doc files
      
      * More sneaky imports
      
      * Finish fixing tests
      
      * Fix examples
      
      * Fix path for copies
      
      * More fixes for examples
      
      * Fix dummy files
      
      * More fixes for example
      
      * More model import fixes
      
      * Is this why you're unhappy GitHub?
      
      * Fix imports in conver command
      c89bdfbe
  3. 10 Nov, 2020 1 commit
  4. 30 Oct, 2020 1 commit
    • Lysandre Debut's avatar
      Ci test tf super slow (#8007) · 10f8c636
      Lysandre Debut authored
      * Test TF GPU CI
      
      * Change cache
      
      * Fix missing torch requirement
      
      * Fix some model tests
      
      
      Style
      
      * LXMERT
      
      * MobileBERT
      
      * Longformer skip test
      
      * XLNet
      
      * The rest of the tests
      
      * RAG goes OOM in multi gpu setup
      
      * YAML test files
      
      * Last fixes
      
      * Skip doctests
      
      * Fill mask tests
      
      * Yaml files
      
      * Last test fix
      
      * Style
      
      * Update cache
      
      * Change ONNX tests to slow + use tiny model
      10f8c636
  5. 19 Oct, 2020 1 commit
    • Lalit Pagaria's avatar
      [RAG] Propagating of n_docs as parameter to all RagModel's related functions (#7891) · 0193c829
      Lalit Pagaria authored
      
      
      * Propagating n_docs as parameter to all RagModel's related functions that defaults to self.config.n_docs
      
      * Making n_docs parameter's default value to None in marginalize function
      
      * Fixing code quality issues
      
      * Handle the special case when generator is of T5PreTrainedModel instance type. T5PreTrainedModel do not have n_docs as parameter
      
      * T5PreTrainedModel do not have n_docs as parameter
      
      * Addressing review comment
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Correcting comment by addressing review comment
      
      * Adding assert statement verifying that n_docs is correctly set. n_docs should be the same for both retriever and generator.
      
      * Fixing flake8 reported issue
      
      * Correcting test datasets for rag
      
      * Using doc_scores instead of context_input_ids to check assert as in RagSequenceForGeneration context_input_ids can be null
      
      * doc_scores second dimension have number of retrieved docs
      
      * Changing assert comment
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      0193c829
  6. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 馃帀
      
      * and removed hard dependency on tokenizers 馃帀
      
      
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart
      
      * bump up tokenizers to 0.9.2
      
      * fix model tests
      
      * fix tf models
      
      * fix seq2seq examples
      
      * fix tests without sentencepiece
      
      * fix slow => fast  conversion without sentencepiece
      
      * update auto and bert generation tests
      
      * fix mbart tests
      
      * fix auto and common test without tokenizers
      
      * fix tests without tokenizers
      
      * clean up tests lighten up when tokenizers + sentencepiece are both off
      
      * style quality and tests fixing
      
      * add sentencepiece to doc/examples reqs
      
      * leave sentencepiece on for now
      
      * style quality split hebert and fix pegasus
      
      * WIP Herbert fast
      
      * add sample_text_no_unicode and fix hebert tokenization
      
      * skip FSMT example test for now
      
      * fix style
      
      * fix fsmt in example tests
      
      * update following Lysandre and Sylvain's comments
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ba8c4d0a
  7. 25 Sep, 2020 1 commit
  8. 24 Sep, 2020 1 commit
  9. 22 Sep, 2020 1 commit
    • Ola Piktus's avatar
      RAG (#6813) · c754c41c
      Ola Piktus authored
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * Formatting / renaming prior to actual work
      
      * First commit
      
      * improve comments
      
      * Retrieval evaluation scripts
      
      * refactor to include modeling outputs + MPI retriever
      
      * Fix rag-token model + refactor
      
      * Various fixes + finetuning logic
      
      * use_bos fix
      
      * Retrieval refactor
      
      * Finetuning refactoring and cleanup
      
      * Add documentation and cleanup
      
      * Remove set_up_rag_env.sh file
      
      * Fix retrieval wit HF index
      
      * Fix import errors
      
      * Fix quality errors
      
      * Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
      
      
      
      * fix quality
      
      * Fix RAG Sequence generation
      
      * minor cleanup plus initial tests
      
      * fix test
      
      * fix tests 2
      
      * Comments fix
      
      * post-merge fixes
      
      * Improve readme + post-rebase refactor
      
      * Extra dependencied for tests
      
      * Fix tests
      
      * Fix tests 2
      
      * Refactor test requirements
      
      * Fix tests 3
      
      * Post-rebase refactor
      
      * rename nlp->datasets
      
      * RAG integration tests
      
      * add tokenizer to slow integration test and allow retriever to run on cpu
      
      * add tests; fix position ids warning
      
      * change structure
      
      * change structure
      
      * add from encoder generator
      
      * save working solution
      
      * make all integration tests pass
      
      * add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
      
      * don't save paths
      
      * delete unnecessary imports
      
      * pass config to AutoTokenizer.from_pretrained for Rag tokenizers
      
      * init wiki_dpr only once
      
      * hardcode legacy index and passages paths (todo: add the right urls)
      
      * finalize config
      
      * finalize retriver api and config api
      
      * LegacyIndex index download refactor
      
      * add dpr to autotokenizer
      
      * make from pretrained more flexible
      
      * fix ragfortokengeneration
      
      * small name changes in tokenizer
      
      * add labels to models
      
      * change default index name
      
      * add retrieval tests
      
      * finish token generate
      
      * align test with previous version and make all tests pass
      
      * add tests
      
      * finalize tests
      
      * implement thoms suggestions
      
      * add first version of test
      
      * make first tests work
      
      * make retriever platform agnostic
      
      * naming
      
      * style
      
      * add legacy index URL
      
      * docstrings + simple retrieval test for distributed
      
      * clean model api
      
      * add doc_ids to retriever's outputs
      
      * fix retrieval tests
      
      * finish model outputs
      
      * finalize model api
      
      * fix generate problem for rag
      
      * fix generate for other modles
      
      * fix some tests
      
      * save intermediate
      
      * set generate to default
      
      * big refactor generate
      
      * delete rag_api
      
      * correct pip faiss install
      
      * fix auto tokenization test
      
      * fix faiss install
      
      * fix test
      
      * move the distributed logic to examples
      
      * model page
      
      * docs
      
      * finish tests
      
      * fix dependencies
      
      * fix import in __init__
      
      * Refactor eval_rag and finetune scripts
      
      * start docstring
      
      * add psutil to test
      
      * fix tf test
      
      * move require torch to top
      
      * fix retrieval test
      
      * align naming
      
      * finish automodel
      
      * fix repo consistency
      
      * test ragtokenizer save/load
      
      * add rag model output docs
      
      * fix ragtokenizer save/load from pretrained
      
      * fix tokenizer dir
      
      * remove torch in retrieval
      
      * fix docs
      
      * fixe finetune scripts
      
      * finish model docs
      
      * finish docs
      
      * remove auto model for now
      
      * add require torch
      
      * remove solved todos
      
      * integrate sylvains suggestions
      
      * sams comments
      
      * correct mistake on purpose
      
      * improve README
      
      * Add generation test cases
      
      * fix rag token
      
      * clean token generate
      
      * fix test
      
      * add note to test
      
      * fix attention mask
      
      * add t5 test for rag
      
      * Fix handling prefix in finetune.py
      
      * don't overwrite index_name
      Co-authored-by: default avatarPatrick Lewis <plewis@fb.com>
      Co-authored-by: default avatarAleksandra Piktus <piktus@devfair0141.h2.fair>
      Co-authored-by: default avatarAleksandra Piktus <piktus@learnfair5102.h2.fair>
      Co-authored-by: default avatarAleksandra Piktus <piktus@learnfair5067.h2.fair>
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarQuentin Lhoest <lhoest.q@gmail.com>
      c754c41c