"src/include/utility.hpp" did not exist on "08c7f74391d7a0fb02028c3ee7e83383f7751a3c"
  1. 13 Oct, 2020 2 commits
  2. 09 Oct, 2020 5 commits
  3. 08 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58
      Thomas Wolf authored
      
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
      
      * [WIP] SP tokenizers
      
      * fixing tests for T5
      
      * WIP tokenizers
      
      * serialization
      
      * update T5
      
      * WIP T5 tokenization
      
      * slow to fast conversion script
      
      * Refactoring to move tokenzier implementations inside transformers
      
      * Adding gpt - refactoring - quality
      
      * WIP adding several tokenizers to the fast world
      
      * WIP Roberta - moving implementations
      
      * update to dev4 switch file loading to in-memory loading
      
      * Updating and fixing
      
      * advancing on the tokenizers - updating do_lower_case
      
      * style and quality
      
      * moving forward with tokenizers conversion and tests
      
      * MBart, T5
      
      * dumping the fast version of transformer XL
      
      * Adding to autotokenizers + style/quality
      
      * update init and space_between_special_tokens
      
      * style and quality
      
      * bump up tokenizers version
      
      * add protobuf
      
      * fix pickle Bert JP with Mecab
      
      * fix newly added tokenizers
      
      * style and quality
      
      * fix bert japanese
      
      * fix funnel
      
      * limite tokenizer warning to one occurence
      
      * clean up file
      
      * fix new tokenizers
      
      * fast tokenizers deep tests
      
      * WIP adding all the special fast tests on the new fast tokenizers
      
      * quick fix
      
      * adding more fast tokenizers in the fast tests
      
      * all tokenizers in fast version tested
      
      * Adding BertGenerationFast
      
      * bump up setup.py for CI
      
      * remove BertGenerationFast (too early)
      
      * bump up tokenizers version
      
      * Clean old docstrings
      
      * Typo
      
      * Update following Lysandre comments
      Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
      9aeacb58
  4. 07 Oct, 2020 2 commits
  5. 06 Oct, 2020 2 commits
  6. 05 Oct, 2020 5 commits
    • Lysandre Debut's avatar
      The toggle actually sticks (#7586) · 818c294f
      Lysandre Debut authored
      818c294f
    • Sylvain Gugger's avatar
      Check and update model list in index.rst automatically (#7527) · b2b7fc78
      Sylvain Gugger authored
      * Check and update model list in index.rst automatically
      
      * Check and update model list in index.rst automatically
      
      * Adapt template
      b2b7fc78
    • Amine Abdaoui's avatar
      docs(pretrained_models): fix num parameters (#7575) · 0d79de73
      Amine Abdaoui authored
      
      
      * docs(pretrained_models): fix num parameters
      
      * fix(pretrained_models): correct typo
      Co-authored-by: default avatarAmin <amin.geotrend@gmail.com>
      0d79de73
    • Forrest Iandola's avatar
      SqueezeBERT architecture (#7083) · 02ef825b
      Forrest Iandola authored
      * configuration_squeezebert.py
      
      thin wrapper around bert tokenizer
      
      fix typos
      
      wip sb model code
      
      wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
      
      set up squeezebert to use BertModelOutput when returning results.
      
      squeezebert documentation
      
      formatting
      
      allow head mask that is an array of [None, ..., None]
      
      docs
      
      docs cont'd
      
      path to vocab
      
      docs and pointers to cloud files (WIP)
      
      line length and indentation
      
      squeezebert model cards
      
      formatting of model cards
      
      untrack modeling_squeezebert_scratchpad.py
      
      update aws paths to vocab and config files
      
      get rid of stub of NSP code, and advise users to pretrain with mlm only
      
      fix rebase issues
      
      redo rebase of modeling_auto.py
      
      fix issues with code formatting
      
      more code format auto-fixes
      
      move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
      
      tests for squeezebert modeling and tokenization
      
      fix typo
      
      move squeezebert before bert in modeling_auto.py to fix inheritance problem
      
      disable test_head_masking, since squeezebert doesn't yet implement head masking
      
      fix issues exposed by the test_modeling_squeezebert.py
      
      fix an issue exposed by test_tokenization_squeezebert.py
      
      fix issue exposed by test_modeling_squeezebert.py
      
      auto generated code style improvement
      
      issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
      
      update copyright
      
      resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
      
      docs
      
      add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
      
      autogenerated formatting tweaks
      
      integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
      
      * tiny change to order of imports
      02ef825b
    • Sylvain Gugger's avatar
      Cleanup documentation for BART, Marian, MBART and Pegasus (#7523) · e2c935f5
      Sylvain Gugger authored
      * Cleanup documentation for BART, Marian, MBART and Pegasus
      
      * Cleanup documentation for BART, Marian, MBART and Pegasus
      e2c935f5
  7. 01 Oct, 2020 2 commits
  8. 30 Sep, 2020 3 commits
  9. 29 Sep, 2020 2 commits
  10. 28 Sep, 2020 5 commits
  11. 24 Sep, 2020 3 commits
  12. 23 Sep, 2020 2 commits
  13. 22 Sep, 2020 5 commits
    • Ola Piktus's avatar
      RAG (#6813) · c754c41c
      Ola Piktus authored
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * Formatting / renaming prior to actual work
      
      * First commit
      
      * improve comments
      
      * Retrieval evaluation scripts
      
      * refactor to include modeling outputs + MPI retriever
      
      * Fix rag-token model + refactor
      
      * Various fixes + finetuning logic
      
      * use_bos fix
      
      * Retrieval refactor
      
      * Finetuning refactoring and cleanup
      
      * Add documentation and cleanup
      
      * Remove set_up_rag_env.sh file
      
      * Fix retrieval wit HF index
      
      * Fix import errors
      
      * Fix quality errors
      
      * Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
      
      
      
      * fix quality
      
      * Fix RAG Sequence generation
      
      * minor cleanup plus initial tests
      
      * fix test
      
      * fix tests 2
      
      * Comments fix
      
      * post-merge fixes
      
      * Improve readme + post-rebase refactor
      
      * Extra dependencied for tests
      
      * Fix tests
      
      * Fix tests 2
      
      * Refactor test requirements
      
      * Fix tests 3
      
      * Post-rebase refactor
      
      * rename nlp->datasets
      
      * RAG integration tests
      
      * add tokenizer to slow integration test and allow retriever to run on cpu
      
      * add tests; fix position ids warning
      
      * change structure
      
      * change structure
      
      * add from encoder generator
      
      * save working solution
      
      * make all integration tests pass
      
      * add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
      
      * don't save paths
      
      * delete unnecessary imports
      
      * pass config to AutoTokenizer.from_pretrained for Rag tokenizers
      
      * init wiki_dpr only once
      
      * hardcode legacy index and passages paths (todo: add the right urls)
      
      * finalize config
      
      * finalize retriver api and config api
      
      * LegacyIndex index download refactor
      
      * add dpr to autotokenizer
      
      * make from pretrained more flexible
      
      * fix ragfortokengeneration
      
      * small name changes in tokenizer
      
      * add labels to models
      
      * change default index name
      
      * add retrieval tests
      
      * finish token generate
      
      * align test with previous version and make all tests pass
      
      * add tests
      
      * finalize tests
      
      * implement thoms suggestions
      
      * add first version of test
      
      * make first tests work
      
      * make retriever platform agnostic
      
      * naming
      
      * style
      
      * add legacy index URL
      
      * docstrings + simple retrieval test for distributed
      
      * clean model api
      
      * add doc_ids to retriever's outputs
      
      * fix retrieval tests
      
      * finish model outputs
      
      * finalize model api
      
      * fix generate problem for rag
      
      * fix generate for other modles
      
      * fix some tests
      
      * save intermediate
      
      * set generate to default
      
      * big refactor generate
      
      * delete rag_api
      
      * correct pip faiss install
      
      * fix auto tokenization test
      
      * fix faiss install
      
      * fix test
      
      * move the distributed logic to examples
      
      * model page
      
      * docs
      
      * finish tests
      
      * fix dependencies
      
      * fix import in __init__
      
      * Refactor eval_rag and finetune scripts
      
      * start docstring
      
      * add psutil to test
      
      * fix tf test
      
      * move require torch to top
      
      * fix retrieval test
      
      * align naming
      
      * finish automodel
      
      * fix repo consistency
      
      * test ragtokenizer save/load
      
      * add rag model output docs
      
      * fix ragtokenizer save/load from pretrained
      
      * fix tokenizer dir
      
      * remove torch in retrieval
      
      * fix docs
      
      * fixe finetune scripts
      
      * finish model docs
      
      * finish docs
      
      * remove auto model for now
      
      * add require torch
      
      * remove solved todos
      
      * integrate sylvains suggestions
      
      * sams comments
      
      * correct mistake on purpose
      
      * improve README
      
      * Add generation test cases
      
      * fix rag token
      
      * clean token generate
      
      * fix test
      
      * add note to test
      
      * fix attention mask
      
      * add t5 test for rag
      
      * Fix handling prefix in finetune.py
      
      * don't overwrite index_name
      Co-authored-by: default avatarPatrick Lewis <plewis@fb.com>
      Co-authored-by: default avatarAleksandra Piktus <piktus@devfair0141.h2.fair>
      Co-authored-by: default avatarAleksandra Piktus <piktus@learnfair5102.h2.fair>
      Co-authored-by: default avatarAleksandra Piktus <piktus@learnfair5067.h2.fair>
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarQuentin Lhoest <lhoest.q@gmail.com>
      c754c41c
    • Lysandre's avatar
      Documentation version · 6e21f242
      Lysandre authored
      6e21f242
    • Lysandre's avatar
      Release: v3.2.0 · 3ebb1b3a
      Lysandre authored
      3ebb1b3a
    • Sylvain Gugger's avatar
      is_pretokenized -> is_split_into_words (#7236) · 21ca1480
      Sylvain Gugger authored
      * is_pretokenized -> is_split_into_words
      
      * Fix tests
      21ca1480
    • Minghao Li's avatar
      Add LayoutLM Model (#7064) · cd9a0585
      Minghao Li authored
      
      
      * first version
      
      * finish test docs readme model/config/tokenization class
      
      * apply make style and make quality
      
      * fix layoutlm GitHub link
      
      * fix conflict in index.rst and add layoutlm to pretrained_models.rst
      
      * fix bug in test_parents_and_children_in_mappings
      
      * reformat modeling_auto.py and tokenization_auto.py
      
      * fix bug in test_modeling_layoutlm.py
      
      * Update docs/source/model_doc/layoutlm.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update docs/source/model_doc/layoutlm.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * remove inh, add tokenizer fast, and update some doc
      
      * copy and rename necessary class from modeling_bert to modeling_layoutlm
      
      * Update src/transformers/configuration_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/configuration_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/configuration_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/configuration_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/modeling_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/modeling_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/modeling_layoutlm.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * add mish to activations.py, import ACT2FN and import logging from utils
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      cd9a0585
  14. 20 Sep, 2020 1 commit