1. 18 Oct, 2023 1 commit
    • Arthur's avatar
      [`Tokenizer`] Fix slow and fast serialization (#26570) · ef7e9369
      Arthur authored
      * fix
      
      * last attempt
      
      * current work
      
      * fix forward compatibility
      
      * save all special tokens
      
      * current state
      
      * revert additional changes
      
      * updates
      
      * remove tokenizer.model
      
      * add a test and the fix
      
      * nit
      
      * revert one more break
      
      * fix typefield issue
      
      * quality
      
      * more tests
      
      * fix fields for FC
      
      * more nits?
      
      * new additional changes
      
      * how
      
      * some updates
      
      * simplify all
      
      * more nits
      
      * revert some things to original
      
      * nice
      
      * nits
      
      * a small hack
      
      * more nits
      
      * ahhaha
      
      * fixup
      
      * update
      
      * make test run on ci
      
      * use subtesting
      
      * update
      
      * Update .circleci/create_circleci_config.py
      
      * updates
      
      * fixup
      
      * nits
      
      * replace typo
      
      * fix the test
      
      * nits
      
      * update
      
      * None max dif pls
      
      * a partial fix
      
      * had to revert one thing
      
      * test the fast
      
      * updates
      
      * fixup
      
      * and more nits
      
      * more fixes
      
      * update
      
      * Oupsy 👁
      
      
      
      * nits
      
      * fix marian
      
      * on our way to heaven
      
      * Update src/transformers/models/t5/tokenization_t5.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * fixup
      
      * Update src/transformers/tokenization_utils_fast.py
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * fix phobert
      
      * skip some things, test more
      
      * nits
      
      * fixup
      
      * fix deberta
      
      * update
      
      * update
      
      * more updates
      
      * skip one test
      
      * more updates
      
      * fix camembert
      
      * can't test this one
      
      * more good fixes
      
      * kind of a major update
      
      - seperate what is only done in fast in fast init and refactor
      - add_token(AddedToken(..., speicla = True)) ignores it in fast
      - better loading
      
      * fixup
      
      * more fixups
      
      * fix pegasus and mpnet
      
      * remove skipped tests
      
      * fix phoneme tokenizer if self.verbose
      
      * fix individual models
      
      * update common tests
      
      * update testing files
      
      * all over again
      
      * nits
      
      * skip test for markup lm
      
      * fixups
      
      * fix order of addition in fast by sorting the added tokens decoder
      
      * proper defaults for deberta
      
      * correct default for fnet
      
      * nits on add tokens, string initialized to special if special
      
      * skip irrelevant herbert tests
      
      * main fixes
      
      * update test added_tokens_serialization
      
      * the fix for bart like models and class instanciating
      
      * update bart
      
      * nit!
      
      * update idefix test
      
      * fix whisper!
      
      * some fixup
      
      * fixups
      
      * revert some of the wrong chanegs
      
      * fixup
      
      * fixup
      
      * skip marian
      
      * skip the correct tests
      
      * skip for tf and flax as well
      
      ---------
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      ef7e9369
  2. 18 Sep, 2023 1 commit
    • Arthur's avatar
      🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909) · 2da88537
      Arthur authored
      
      
      * fix test for bart. Order is correct now let's skip BPEs
      
      * ouf
      
      * styling
      
      * fix bert....
      
      * slow refactoring
      
      * current updates
      
      * massive refactoring
      
      * update
      
      * NICE!
      
      * update to see where I am at
      
      * updates
      
      * update
      
      * update
      
      * revert
      
      * updates
      
      * updates
      
      * start supporting legacy_save
      
      * styling
      
      * big update
      
      * revert some changes
      
      * nits
      
      * nniiiiiice
      
      * small fixes
      
      * kinda fix t5 with new behaviour
      
      * major update
      
      * fixup
      
      * fix copies
      
      * today's updates
      
      * fix byt5
      
      * upfate
      
      * update
      
      * update
      
      * updates
      
      * update vocab size test
      
      * Barthez does not use not need the fairseq offset ids
      
      * super calll must be after
      
      * calll super
      
      * move all super init
      
      * move other super init
      
      * fixup
      
      * nits
      
      * more fixes
      
      * nits
      
      * more fixes
      
      * nits
      
      * more fix
      
      * remove useless files
      
      * ouch all of them are affected
      
      * and more!
      
      * small imporvements
      
      * no more sanitize token
      
      * more changes around unique no split tokens
      
      * partially fix more things
      
      * keep legacy save but add warning
      
      * so... more fixes
      
      * updates
      
      * guess deberta tokenizer could be nuked
      
      * fixup
      
      * fixup did some bad things
      
      * nuke it if it breaks
      
      * remove prints and pretrain fast from slow with new format.
      
      * fixups
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fiou
      
      * nit
      
      * by default specials should not be normalized?
      
      * update
      
      * remove brakpoint
      
      * updates
      
      * a lot of updates
      
      * fixup
      
      * fixes revert some changes to match fast
      
      * small nits
      
      * that makes it cleaner
      
      * fix camembert accordingly
      
      * update
      
      * some lest breaking changes
      
      * update
      
      * fixup
      
      * fix byt5 and whisper mostly
      
      * some more fixes, canine's byte vocab
      
      * fix gpt2
      
      * fix most of the perceiver tests (4 left)
      
      * fix layout lmv3
      
      * fixup
      
      * fix copies for gpt2 style
      
      * make sure to only warn once
      
      * fix perciever and gpt2 tests
      
      * some more backward compatibility: also read special tokens map because some ppl use it........////.....
      
      * fixup
      
      * add else when reading
      
      * nits
      
      * fresh updates
      
      * fix copies
      
      * will this make everything faster?
      
      * fixes
      
      * more fixes
      
      * update
      
      * more fixes
      
      * fixup
      
      * is the source of truth right?
      
      * sorry camembert for the troubles
      
      * current updates
      
      * fixup
      
      * update led
      
      * update
      
      * fix regression
      
      * fix single word
      
      * more model specific fixes
      
      * fix t5 tests
      
      * fixup
      
      * more comments
      
      * update
      
      * fix nllb
      
      * rstrip removed
      
      * small fixes
      
      * better handle additional_special_tokens and vocab sizes
      
      * fixing
      
      * styling
      
      * fix 4 / 21
      
      * fixup
      
      * fix nlbb's tests
      
      * some fixes
      
      * fix t5
      
      * fixes
      
      * style
      
      * fix canine tests
      
      * damn this is nice
      
      * nits
      
      * m2m100 nit
      
      * fixups
      
      * fixes!
      
      * fixup
      
      * stash
      
      * fix merge
      
      * revert bad change
      
      * fixup
      
      * correct order for code Llama
      
      * fix speecht5 post merge
      
      * styling
      
      * revert source of 11 fails
      
      * small nits
      
      * all changes in one go
      
      * fnet hack
      
      * fix 2 more tests
      
      * update based on main branch of tokenizers
      
      * fixup
      
      * fix VITS issues
      
      * more fixes
      
      * fix mgp test
      
      * fix camembert issues
      
      * oups camembert still has 2 failing tests
      
      * mluke fixes
      
      * decode fixes
      
      * small nits
      
      * nits
      
      * fix llama and vits
      
      * fix camembert
      
      * smal nits
      
      * more fixes when initialising a fast from a slow and etc
      
      * fix one of the last test
      
      * fix CPM tokenizer test
      
      * fixups
      
      * fix pop2piano
      
      * fixup
      
      * ️ Change tokenizers required version ️
      
      * ️ Change tokenizers required version ️
      
      * "tokenizers>=0.14,<0.15", don't forget smaller than
      
      * fix musicgen tests and pretraiendtokenizerfast
      
      * fix owlvit and all
      
      * update t5
      
      * fix 800 red
      
      * fix tests
      
      * fix the fix of the fix of t5
      
      * styling
      
      * documentation nits
      
      * cache _added_tokens_encoder
      
      * fixups
      
      * Nit
      
      * fix red tests
      
      * one last nit!
      
      * make eveything a lot simpler
      
      * Now it's over 😉
      
      
      
      * few small nits
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * updates that work for now
      
      * tests that should no be skipped / changed and fixed next
      
      * fixup
      
      * i am ashamed
      
      * pushe the fix
      
      * update
      
      * fixups
      
      * nits
      
      * fix added_tokens_encoder
      
      * fix canine test
      
      * fix pegasus vocab
      
      * fix transfoXL
      
      * fixup
      
      * whisper needs to be fixed for train new
      
      * pegasus nits
      
      * more pegasus fixes
      
      * minor update
      
      * better error message in failed test
      
      * fix whisper failing test
      
      * fix whisper failing test
      
      * fix pegasus
      
      * fixup
      
      * fix **** pegasus
      
      * reset things
      
      * remove another file
      
      * attempts to fix the strange custome encoder and offset
      
      * nits here and there
      
      * update
      
      * fixup
      
      * nit
      
      * fix the whisper test
      
      * nits nits
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * updates based on review
      
      * some small update to potentially remove
      
      * nits
      
      * import rlu cache
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * move warning to `from_pretrained`
      
      * update tests results now that the special tokens are always added
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      2da88537
  3. 24 May, 2023 1 commit
    • Matt's avatar
      Better TF docstring types (#23477) · f8b25744
      Matt authored
      * Rework TF type hints to use | None instead of Optional[] for tf.Tensor
      
      * Rework TF type hints to use | None instead of Optional[] for tf.Tensor
      
      * Don't forget the imports
      
      * Add the imports to tests too
      
      * make fixup
      
      * Refactor tests that depended on get_type_hints
      
      * Better test refactor
      
      * Fix an old hidden bug in the test_keras_fit input creation code
      
      * Fix for the Deit tests
      f8b25744
  4. 10 Feb, 2023 1 commit
  5. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  6. 03 May, 2022 1 commit
    • Yih-Dar's avatar
      Move test model folders (#17034) · 19420fd9
      Yih-Dar authored
      
      
      * move test model folders (TODO: fix imports and others)
      
      * fix (potentially partially) imports (in model test modules)
      
      * fix (potentially partially) imports (in tokenization test modules)
      
      * fix (potentially partially) imports (in feature extraction test modules)
      
      * fix import utils.test_modeling_tf_core
      
      * fix path ../fixtures/
      
      * fix imports about generation.test_generation_flax_utils
      
      * fix more imports
      
      * fix fixture path
      
      * fix get_test_dir
      
      * update module_to_test_file
      
      * fix get_tests_dir from wrong transformers.utils
      
      * update config.yml (CircleCI)
      
      * fix style
      
      * remove missing imports
      
      * update new model script
      
      * update check_repo
      
      * update SPECIAL_MODULE_TO_TEST_MAP
      
      * fix style
      
      * add __init__
      
      * update self-scheduled
      
      * fix add_new_model scripts
      
      * check one way to get location back
      
      * python setup.py build install
      
      * fix import in test auto
      
      * update self-scheduled.yml
      
      * update slack notification script
      
      * Add comments about artifact names
      
      * fix for yolos
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      19420fd9