1. 17 Jul, 2024 1 commit
  2. 26 Jun, 2024 1 commit
  3. 21 Jun, 2024 1 commit
    • Ita Zaporozhets's avatar
      SPLIT PR: add user defined symbols and control symbols (#31305) · 1e79eade
      Ita Zaporozhets authored
      * PR SPLIT: moving origina changes for adding user defined symbols
      
      * adding gemma test and generalizing gemma converter
      
      * ruff
      
      * update common test
      
      * update serialization test
      
      * deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
      
      * removing commented lines
      
      * applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
      
      * add comment referencing sentencepiece
      1e79eade
  4. 24 May, 2024 1 commit
  5. 22 May, 2024 1 commit
  6. 15 Apr, 2024 1 commit
  7. 05 Apr, 2024 1 commit
  8. 25 Mar, 2024 1 commit
  9. 20 Mar, 2024 1 commit
  10. 19 Mar, 2024 1 commit
    • Arthur's avatar
      [`GemmaConverter`] use user_defined_symbols (#29473) · 2f9a3edb
      Arthur authored
      * use user_defined_symbols
      
      * fixup
      
      * nit
      
      * add a very robust test
      
      * make sure all models are tested with the `pretrained_tokenizer_to_test`
      
      * should we make sure we test all of them?
      
      * merge
      
      * remove the id
      
      * fix test
      
      * update
      
      * ousies
      
      * oups
      
      * fixup
      
      * fix copies check
      
      * remove `pretrained_tokenizer_to_test`
      2f9a3edb
  11. 14 Mar, 2024 1 commit
    • Matt's avatar
      Allow apply_chat_template to pass kwargs to the template and support a dict of templates (#29658) · 48fbab73
      Matt authored
      * Allow apply_chat_template to pass kwargs to the template
      
      * Fix priority for template_kwargs
      
      * Fix docstring
      
      * style fix
      
      * Add the option for the model to have a dict of templates
      
      * Error message cleanup
      
      * Add test for chat template dicts
      
      * Simplify the chat template dict test and apply it to all tokenizers in self.get_tokenizers()
      
      * Save chat template dicts as lists with fixed key names
      
      * Add test for serialization/reloading
      
      * Add require_jinja just to be safe, even though I don't think we use it
      48fbab73
  12. 13 Mar, 2024 1 commit
  13. 16 Feb, 2024 1 commit
  14. 16 Jan, 2024 1 commit
  15. 16 Nov, 2023 1 commit
    • Arthur's avatar
      [`Styling`] stylify using ruff (#27144) · 651408a0
      Arthur authored
      
      
      * try to stylify using ruff
      
      * might need to remove these changes?
      
      * use ruf format andruff check
      
      * use isinstance instead of type comparision
      
      * use # fmt: skip
      
      * use # fmt: skip
      
      * nits
      
      * soem styling changes
      
      * update ci job
      
      * nits isinstance
      
      * more files update
      
      * nits
      
      * more nits
      
      * small nits
      
      * check and format
      
      * revert wrong changes
      
      * actually use formatter instead of checker
      
      * nits
      
      * well docbuilder is overwriting this commit
      
      * revert notebook changes
      
      * try to nuke docbuilder
      
      * style
      
      * fix feature exrtaction test
      
      * remve `indent-width = 4`
      
      * fixup
      
      * more nits
      
      * update the ruff version that we use
      
      * style
      
      * nuke docbuilder styling
      
      * leve the print for detected changes
      
      * nits
      
      * Remove file I/O
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      
      * style
      
      * nits
      
      * revert notebook changes
      
      * Add # fmt skip when possible
      
      * Add # fmt skip when possible
      
      * Fix
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * NIts
      
      * more fixes
      
      * fix tapas
      
      * Another way to skip
      
      * Recommended way
      
      * Fix two more fiels
      
      * Remove asynch
      Remove asynch
      
      ---------
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      651408a0
  16. 18 Oct, 2023 1 commit
    • Arthur's avatar
      [`Tokenizer`] Fix slow and fast serialization (#26570) · ef7e9369
      Arthur authored
      * fix
      
      * last attempt
      
      * current work
      
      * fix forward compatibility
      
      * save all special tokens
      
      * current state
      
      * revert additional changes
      
      * updates
      
      * remove tokenizer.model
      
      * add a test and the fix
      
      * nit
      
      * revert one more break
      
      * fix typefield issue
      
      * quality
      
      * more tests
      
      * fix fields for FC
      
      * more nits?
      
      * new additional changes
      
      * how
      
      * some updates
      
      * simplify all
      
      * more nits
      
      * revert some things to original
      
      * nice
      
      * nits
      
      * a small hack
      
      * more nits
      
      * ahhaha
      
      * fixup
      
      * update
      
      * make test run on ci
      
      * use subtesting
      
      * update
      
      * Update .circleci/create_circleci_config.py
      
      * updates
      
      * fixup
      
      * nits
      
      * replace typo
      
      * fix the test
      
      * nits
      
      * update
      
      * None max dif pls
      
      * a partial fix
      
      * had to revert one thing
      
      * test the fast
      
      * updates
      
      * fixup
      
      * and more nits
      
      * more fixes
      
      * update
      
      * Oupsy 👁
      
      
      
      * nits
      
      * fix marian
      
      * on our way to heaven
      
      * Update src/transformers/models/t5/tokenization_t5.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * fixup
      
      * Update src/transformers/tokenization_utils_fast.py
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * fix phobert
      
      * skip some things, test more
      
      * nits
      
      * fixup
      
      * fix deberta
      
      * update
      
      * update
      
      * more updates
      
      * skip one test
      
      * more updates
      
      * fix camembert
      
      * can't test this one
      
      * more good fixes
      
      * kind of a major update
      
      - seperate what is only done in fast in fast init and refactor
      - add_token(AddedToken(..., speicla = True)) ignores it in fast
      - better loading
      
      * fixup
      
      * more fixups
      
      * fix pegasus and mpnet
      
      * remove skipped tests
      
      * fix phoneme tokenizer if self.verbose
      
      * fix individual models
      
      * update common tests
      
      * update testing files
      
      * all over again
      
      * nits
      
      * skip test for markup lm
      
      * fixups
      
      * fix order of addition in fast by sorting the added tokens decoder
      
      * proper defaults for deberta
      
      * correct default for fnet
      
      * nits on add tokens, string initialized to special if special
      
      * skip irrelevant herbert tests
      
      * main fixes
      
      * update test added_tokens_serialization
      
      * the fix for bart like models and class instanciating
      
      * update bart
      
      * nit!
      
      * update idefix test
      
      * fix whisper!
      
      * some fixup
      
      * fixups
      
      * revert some of the wrong chanegs
      
      * fixup
      
      * fixup
      
      * skip marian
      
      * skip the correct tests
      
      * skip for tf and flax as well
      
      ---------
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      ef7e9369
  17. 06 Oct, 2023 1 commit
  18. 18 Sep, 2023 1 commit
    • Arthur's avatar
      🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909) · 2da88537
      Arthur authored
      
      
      * fix test for bart. Order is correct now let's skip BPEs
      
      * ouf
      
      * styling
      
      * fix bert....
      
      * slow refactoring
      
      * current updates
      
      * massive refactoring
      
      * update
      
      * NICE!
      
      * update to see where I am at
      
      * updates
      
      * update
      
      * update
      
      * revert
      
      * updates
      
      * updates
      
      * start supporting legacy_save
      
      * styling
      
      * big update
      
      * revert some changes
      
      * nits
      
      * nniiiiiice
      
      * small fixes
      
      * kinda fix t5 with new behaviour
      
      * major update
      
      * fixup
      
      * fix copies
      
      * today's updates
      
      * fix byt5
      
      * upfate
      
      * update
      
      * update
      
      * updates
      
      * update vocab size test
      
      * Barthez does not use not need the fairseq offset ids
      
      * super calll must be after
      
      * calll super
      
      * move all super init
      
      * move other super init
      
      * fixup
      
      * nits
      
      * more fixes
      
      * nits
      
      * more fixes
      
      * nits
      
      * more fix
      
      * remove useless files
      
      * ouch all of them are affected
      
      * and more!
      
      * small imporvements
      
      * no more sanitize token
      
      * more changes around unique no split tokens
      
      * partially fix more things
      
      * keep legacy save but add warning
      
      * so... more fixes
      
      * updates
      
      * guess deberta tokenizer could be nuked
      
      * fixup
      
      * fixup did some bad things
      
      * nuke it if it breaks
      
      * remove prints and pretrain fast from slow with new format.
      
      * fixups
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fiou
      
      * nit
      
      * by default specials should not be normalized?
      
      * update
      
      * remove brakpoint
      
      * updates
      
      * a lot of updates
      
      * fixup
      
      * fixes revert some changes to match fast
      
      * small nits
      
      * that makes it cleaner
      
      * fix camembert accordingly
      
      * update
      
      * some lest breaking changes
      
      * update
      
      * fixup
      
      * fix byt5 and whisper mostly
      
      * some more fixes, canine's byte vocab
      
      * fix gpt2
      
      * fix most of the perceiver tests (4 left)
      
      * fix layout lmv3
      
      * fixup
      
      * fix copies for gpt2 style
      
      * make sure to only warn once
      
      * fix perciever and gpt2 tests
      
      * some more backward compatibility: also read special tokens map because some ppl use it........////.....
      
      * fixup
      
      * add else when reading
      
      * nits
      
      * fresh updates
      
      * fix copies
      
      * will this make everything faster?
      
      * fixes
      
      * more fixes
      
      * update
      
      * more fixes
      
      * fixup
      
      * is the source of truth right?
      
      * sorry camembert for the troubles
      
      * current updates
      
      * fixup
      
      * update led
      
      * update
      
      * fix regression
      
      * fix single word
      
      * more model specific fixes
      
      * fix t5 tests
      
      * fixup
      
      * more comments
      
      * update
      
      * fix nllb
      
      * rstrip removed
      
      * small fixes
      
      * better handle additional_special_tokens and vocab sizes
      
      * fixing
      
      * styling
      
      * fix 4 / 21
      
      * fixup
      
      * fix nlbb's tests
      
      * some fixes
      
      * fix t5
      
      * fixes
      
      * style
      
      * fix canine tests
      
      * damn this is nice
      
      * nits
      
      * m2m100 nit
      
      * fixups
      
      * fixes!
      
      * fixup
      
      * stash
      
      * fix merge
      
      * revert bad change
      
      * fixup
      
      * correct order for code Llama
      
      * fix speecht5 post merge
      
      * styling
      
      * revert source of 11 fails
      
      * small nits
      
      * all changes in one go
      
      * fnet hack
      
      * fix 2 more tests
      
      * update based on main branch of tokenizers
      
      * fixup
      
      * fix VITS issues
      
      * more fixes
      
      * fix mgp test
      
      * fix camembert issues
      
      * oups camembert still has 2 failing tests
      
      * mluke fixes
      
      * decode fixes
      
      * small nits
      
      * nits
      
      * fix llama and vits
      
      * fix camembert
      
      * smal nits
      
      * more fixes when initialising a fast from a slow and etc
      
      * fix one of the last test
      
      * fix CPM tokenizer test
      
      * fixups
      
      * fix pop2piano
      
      * fixup
      
      * ️ Change tokenizers required version ️
      
      * ️ Change tokenizers required version ️
      
      * "tokenizers>=0.14,<0.15", don't forget smaller than
      
      * fix musicgen tests and pretraiendtokenizerfast
      
      * fix owlvit and all
      
      * update t5
      
      * fix 800 red
      
      * fix tests
      
      * fix the fix of the fix of t5
      
      * styling
      
      * documentation nits
      
      * cache _added_tokens_encoder
      
      * fixups
      
      * Nit
      
      * fix red tests
      
      * one last nit!
      
      * make eveything a lot simpler
      
      * Now it's over 😉
      
      
      
      * few small nits
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * updates that work for now
      
      * tests that should no be skipped / changed and fixed next
      
      * fixup
      
      * i am ashamed
      
      * pushe the fix
      
      * update
      
      * fixups
      
      * nits
      
      * fix added_tokens_encoder
      
      * fix canine test
      
      * fix pegasus vocab
      
      * fix transfoXL
      
      * fixup
      
      * whisper needs to be fixed for train new
      
      * pegasus nits
      
      * more pegasus fixes
      
      * minor update
      
      * better error message in failed test
      
      * fix whisper failing test
      
      * fix whisper failing test
      
      * fix pegasus
      
      * fixup
      
      * fix **** pegasus
      
      * reset things
      
      * remove another file
      
      * attempts to fix the strange custome encoder and offset
      
      * nits here and there
      
      * update
      
      * fixup
      
      * nit
      
      * fix the whisper test
      
      * nits nits
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * updates based on review
      
      * some small update to potentially remove
      
      * nits
      
      * import rlu cache
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * move warning to `from_pretrained`
      
      * update tests results now that the special tokens are always added
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      2da88537
  19. 14 Sep, 2023 1 commit
    • Matt's avatar
      Overhaul Conversation class and prompt templating (#25323) · 866df66f
      Matt authored
      
      
      * First commit while I figure this out
      
      * make fixup
      
      * Remove unused method
      
      * Store prompt attrib
      
      * Fix prompt argument for tests
      
      * Make same changes in fast tokenizer
      
      * Remove global prompts from fast tokenizer too
      
      * stash commit
      
      * stash commit
      
      * Migrate PromptConfig to its True Final Location
      
      * Replace Conversation entirely with the new class
      
      * Import/dependency fixes
      
      * Import/dependency fixes
      
      * Change format for lots of default prompts
      
      * More default prompt fixups
      
      * Revert llama old methods so we can compare
      
      * Fix some default configs
      
      * Fix some default configs
      
      * Fix misspelled kwarg
      
      * Fixes for Blenderbot
      
      * make fixup
      
      * little rebase cleanup
      
      * Add basic documentation
      
      * Quick doc fix
      
      * Truncate docstring for now
      
      * Add handling for the case when messages is a single string
      
      * Quick llama merges
      
      * Update conversational pipeline and tests
      
      * Add a couple of legacy properties for backward compatibility
      
      * More legacy handling
      
      * Add docstring for build_conversation_input_ids
      
      * Restructure PromptConfig
      
      * Let's start T E M P L A T I N G
      
      * Refactor all default configs to use templates instead
      
      * Revert changes to the special token properties since we don't need them anymore
      
      * More class templates
      
      * Make the sandbox even sandier
      
      * Everything replaced with pure templating
      
      * Remove docs for PromptConfig
      
      * Add testing and optional requirement boilerplate
      
      * Fix imports and make fixup
      
      * Fix LLaMA tests and add Conversation docstring
      
      * Finally get LLaMA working with the template system
      
      * Finally get LLaMA working with the template system
      
      * make fixup
      
      * make fixup
      
      * fmt-off for the long lists of test tokens
      
      * Rename method to apply_chat_template for now
      
      * Start on documentation
      
      * Make chat_template a property that reads through to the default if it's not set
      
      * Expand docs
      
      * Expand chat templating doc some more
      
      * trim/lstrip blocks by default and update doc
      
      * Few doc tweaks
      
      * rebase cleanup
      
      * Clarify docstring
      
      * rebase cleanup
      
      * rebase cleanup
      
      * make fixup
      
      * Quick doc edit
      
      * Reformat the standard template to match ChatML
      
      * Re-add PEFT check
      
      * Update docs/source/en/chat_templating.md
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Add apply_chat_template to the tokenizer doc
      
      * make fixup
      
      * Add doc links
      
      * Fix chat links
      
      * Fix chat links
      
      * Explain system messages in the doc
      
      * Add chat template test
      
      * Proper save-loading for chat template attribute
      
      * Add test skips for layout models
      
      * Remove _build_conversation_input_ids, add default_chat_template to code_llama
      
      * Make sure all LLaMA models are using the latest template
      
      * Remove default_system_prompt block in code_llama because it has no default prompt
      
      * Update ConversationPipeline preprocess
      
      * Add correct #Copied from links to the default_chat_templates
      
      * Remove unneeded type checking line
      
      * Add a dummy mark_processsed method
      
      * Reorganize Conversation to have **deprecated_kwargs
      
      * Update chat_templating.md
      
      * Quick fix to LLAMA tests
      
      * Small doc tweaks
      
      * Add proper docstrings and "copied from" statements to all default chat templates
      
      * Merge use_default_system_prompt support for code_llama too
      
      * Improve clarity around self.chat_template
      
      * Docstring fix
      
      * Fix blenderbot default template
      
      * More doctest fix
      
      * Break out some tokenizer kwargs
      
      * Update doc to explain default templates
      
      * Quick tweaks to tokenizer args
      
      * Cleanups for tokenizer args
      
      * Add note about cacheing
      
      * Quick tweak to the chat-templating doc
      
      * Update the LLaMA template with error checking and correct system message embedding
      
      * make fixup
      
      * make fixup
      
      * add requires_jinja
      
      * Cleanup to expected output formatting
      
      * Add cacheing
      
      * Fix typo in llama default template
      
      * Update LLaMA tests
      
      * Update documentation
      
      * Improved legacy handling in the Conversation class
      
      * Update Jinja template with proper error handling
      
      * Quick bugfix
      
      * Proper exception raising
      
      * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env
      
      * make fixup
      
      * rebase cleanup
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      866df66f
  20. 18 Aug, 2023 1 commit
    • Arthur's avatar
      [`split_special_tokens`] Add support for `split_special_tokens` argument to encode (#25081) · 30b3c46f
      Arthur authored
      * draft changes
      
      * update and add tests
      
      * styling for no
      
      * move test
      
      * path to usable model
      
      * update test
      
      * small update
      
      * update bertbased tokenizers
      
      * don'tuse kwargs for _tokenize
      
      * don'tuse kwargs for _tokenize
      
      * fix copies
      
      * update
      
      * update test for special tokenizers
      
      * fixup
      
      * skip two tests
      
      * remove pdb breakpiont()
      
      * wowo
      
      * rewrite custom tests
      
      * nits
      
      * revert chang in target keys
      
      * fix markup lm
      
      * update documentation of the argument
      30b3c46f
  21. 27 Jun, 2023 1 commit
  22. 15 Jun, 2023 1 commit
  23. 10 May, 2023 1 commit
  24. 24 Apr, 2023 1 commit
  25. 03 Apr, 2023 1 commit
    • Arthur's avatar
      Fix llama tokenizer (#22402) · c0f99b4d
      Arthur authored
      * draft
      
      * update tokenization limma and conversion script
      
      * more udpates
      
      * initial commit
      
      * style
      
      * default pad to None
      
      * draft tokenization tests
      
      * update test
      
      * update tokenization tests
      
      * nits
      
      * update
      
      * versioning test
      
      * major fix
      
      * fix more testst
      
      * finish fixing special masks
      
      * last nit
      
      * more nits
      
      * add encode decode tests
      
      * add more
      
      * fix token type ids
      
      * style
      c0f99b4d
  26. 29 Mar, 2023 1 commit
  27. 09 Mar, 2023 1 commit
  28. 07 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Cleanup quality (#21493) · 67d07487
      Sylvain Gugger authored
      * Remove mentions of flake8/isort
      
      * Clean up inits
      
      * Deall with all other inits
      
      * Last special rule for dummy files
      67d07487
  29. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  30. 02 Nov, 2022 1 commit
    • Ben Eyal's avatar
      🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2
      Ben Eyal authored
      🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)
      
      * Add test for SentencePiece not adding special tokens to strings
      
      * Add SentencePieceStringConversionMixin to fix issue 15003
      
      * Fix conversion from tokens to string for most SentencePiece tokenizers
      
      Tokenizers fixed:
      - AlbertTokenizer
      - BarthezTokenizer
      - CamembertTokenizer
      - FNetTokenizer
      - M2M100Tokenizer
      - MBart50Tokenizer
      - PegasusTokenizer
      - Speech2TextTokenizer
      
      * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
      
      * Fix DebertaV2Tokenizer
      
      * Ignore LayoutXLMTokenizer in SentencePiece string conversion test
      
      * Run 'make style' and 'make quality'
      
      * Clean convert_tokens_to_string test
      
      Instead of explicitly ignoring LayoutXLMTokenizer in the test,
      override the test in LayoutLMTokenizationTest and do nothing in it.
      
      * Remove commented out code
      
      * Improve robustness of convert_tokens_to_string test
      
      Instead of comparing lengths of re-tokenized text and input_ids,
      check that converting all special tokens to string yields a string
      with all special tokens.
      
      * Inline and remove SentencePieceStringConversionMixin
      
      The convert_tokens_to_string method is now implemented
      in each relevant SentencePiece tokenizer.
      
      * Run 'make style' and 'make quality'
      
      * Revert removal of space in convert_tokens_to_string
      
      * Remove redundant import
      
      * Revert test text to original
      
      * Uncomment the lowercasing of the reverse_text variable
      
      * Mimic Rust tokenizer behavior for tokenizers
      
      - Albert
      - Barthez
      - Camembert
      - MBart50
      - T5
      
      * Fix accidentally skipping test in wrong tokenizer
      
      * Add test for equivalent Rust and slow tokenizer behavior
      
      * Override _decode in BigBirdTokenizer to mimic Rust behavior
      
      * Override _decode in FNetTokenizer to mimic Rust behavior
      
      * Override _decode in XLNetTokenizer to mimic Rust behavior
      
      * Remove unused 're' import
      
      * Update DebertaV2Tokenizer to mimic Rust tokenizer
      
      * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
      
      * Ignore problematic tests in Deberta V2
      
      * Add comment on why the Deberta V2 tests are skipped
      9f9ddcc2
  31. 25 Oct, 2022 1 commit
  32. 14 Oct, 2022 1 commit
  33. 27 Sep, 2022 1 commit
  34. 16 Sep, 2022 2 commits
  35. 15 Sep, 2022 1 commit
  36. 29 Aug, 2022 1 commit
  37. 24 Aug, 2022 1 commit
    • SaulLu's avatar
      add warning to let the user know that the `__call__` method is faster than... · 6667b0d7
      SaulLu authored
      add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer (#18693)
      
      * add warning to let the user know that the  method is slower that  for a fast tokenizer
      
      * user warnings
      
      * fix layoutlmv2
      
      * fix layout*
      
      * change warnings into logger.warning
      6667b0d7
  38. 05 Aug, 2022 1 commit
    • Sylvain Gugger's avatar
      Use new huggingface_hub tools for download models (#18438) · 5cd40323
      Sylvain Gugger authored
      * Draft new cached_file
      
      * Initial draft for config and model
      
      * Small fixes
      
      * Fix first batch of tests
      
      * Look in cache when internet is down
      
      * Fix last tests
      
      * Bad black, not fixing all quality errors
      
      * Make diff less
      
      * Implement change for TF and Flax models
      
      * Add tokenizer and feature extractor
      
      * For compatibility with main
      
      * Add utils to move the cache and auto-do it at first use.
      
      * Quality
      
      * Deal with empty commit shas
      
      * Deal with empty etag
      
      * Address review comments
      5cd40323
  39. 01 Aug, 2022 1 commit