- 29 Mar, 2023 12 commits
-
-
Sylvain Gugger authored
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Sabine authored
-
jeffhataws authored
This reverts commit fd81746dbec5f17c8285a0fdc72ca4b4c025cc33.
-
Younes Belkada authored
fix slow test
-
Sylvain Gugger authored
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444) Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)" This reverts commit bad83008.
-
Yih-Dar authored
Fix some tiny model creation issues Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Sylvain Gugger authored
-
Younes Belkada authored
* add conditional generation * add comments
-
Younes Belkada authored
* fix bnb failing test * fix * fix * fixup
-
Nolwenn Bernard authored
Fixes #22429
-
Arthur authored
* add draft changes * fix failing wav2vec * style * make sure that the argument is saved + add tests * style * fixup * update test * default clean_up_tokenization_spaces to False for Bloom and Llama * Update code based on review Co-authored-by:
Nicolas Patry <patry.nicolas@gmail.com> * style * quality --------- Co-authored-by:
Nicolas Patry <patry.nicolas@gmail.com>
-
- 28 Mar, 2023 4 commits
-
-
Joao Gante authored
Fix docs and doctests
-
Jeff Rasley authored
* ensure causal_mask is created directly on device * add copy tag to opt, update bart implementation * add device to all _make_causal_mask copies * formatting fixes * more manual fixes due to unlinked versions of _prepare_decoder_attention_mask
-
fpgaminer authored
Fix bug in perplexity guide calculations and update perplexity numbers.
-
dependabot[bot] authored
Bump redis in /examples/research_projects/decision_transformer Bumps [redis](https://github.com/redis/redis-py) from 4.1.4 to 4.5.3. - [Release notes](https://github.com/redis/redis-py/releases) - [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES) - [Commits](https://github.com/redis/redis-py/compare/v4.1.4...v4.5.3 ) --- updated-dependencies: - dependency-name: redis dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 27 Mar, 2023 17 commits
-
-
Kshiteej K authored
* [neptune] fix checkpoint bug with relative out_dir * update imports * reformat with black * check neptune without imports * fix typing-related issue * run black on code * use os.path.sep instead of raw \ * simplify imports and remove type annotation * make ruff happy * apply review suggestions --------- Co-authored-by:Aleksander Wojnarowicz <alwojnarowicz@gmail.com>
-
Arthur authored
* Initial commit * update modeling code * update doc * add functions necessary * fix impotrs * revert changes * fixup * more styling to get going * remove standalone encoder * update code * styling * fix config and model * update code and some refactoring * make more tests pass * Adding NLLB-200 - MoE - 54.5B for no language left behind Fixes #21300 * fix mor common tests * styke * update testing file * update * update * Router2 doc * update check config with sparse layer * add dummy router * update current conversion script * create on the fly conversion script * Fixup * style * style 2 * fix empty return * fix return * Update default config sparse layers * easier to create sparse layers * update * update conversion script * update modeling * add to toctree * styling * make ruff happy * update docstring * update conversion script * update, will break tests but impelemting top2 * update *
❗ local groups are supported here *⚠ ️ Support for local groups is now removed⚠ ️ This is because it has to work with model parallelism that we do not support * finish simplificaiton * Fix forward * style * fixup * Update modelling and test, refactoring * update tests * remove final layer)norm as it is done in the FF * routing works! Logits test added * nit in test * remove top1router * style * make sure sparse are tested. Had to change route_tokens a liottle bit * add support for unslip models when converting * fixup * style * update test s * update test * REFACTOR * encoder outputs match! * style * update testing *🎉 encoder and decoder logits match🎉 * styleing * update tests * cleanup tests * fix router test and CIs * cleanup * cleanup test styling * fix tests * Finally the generation tests match! * cleanup * update test * style testing file * remove script * cleanup * more cleanup * nits * update * NLLB tokenizer is wrong and will be fixed soon * use LongTensors * update tests * revert some small changes * fix second expert sampling and batch prioritized routing * update tests * finish last tests * make ruff happy * update * ruff again * style * Update docs/source/en/model_doc/nllb-moe.mdx Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Updates based on review * style and fix import issue * nit * more nits * cleanup * styling * update test_seconde_expert_policy * fix name * last nit on the markdown examples --------- Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Donny Greenberg authored
* Add initial remote hardware auto-setup docs * Fix a few typos and clarify some language * Add missing dependency * Update self-hosted launch script with Sylvain's comments. * Formatting. * Trigger CI * Style
-
Joao Gante authored
missing None check
-
Joao Gante authored
-
NielsRogge authored
* First draft * Fix integration test * Remove script * Fix test and typos * Fix one more test * Skip tied embeddings test * Remove line * Address comments
-
Sylvain Gugger authored
* Report safetensors version in transformers-cli env * Styling * Trigger CI maybe
-
Younes Belkada authored
for rg to be `False`
-
Joao Gante authored
-
Nathan Fradet authored
* seq2seq trainer and training arguments accepting GenerationConfig arg * seq2seq Trainer and training arguments docstring fixes * Update training_args_seq2seq.py docstring Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Fixing trainer_seq2seq.py docstring Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * seq2seq trainer: legacy gen args back & GenerationConfig created at init * Seq2seq trainer: fix in case gen_config.max_new_tokens is None Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding legacy arg retrocompatibility * seq2seq trainer and training arguments accepting GenerationConfig arg * seq2seq Trainer and training arguments docstring fixes * Update training_args_seq2seq.py docstring Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Fixing trainer_seq2seq.py docstring Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * seq2seq trainer: legacy gen args back & GenerationConfig created at init * Seq2seq trainer: fix in case gen_config.max_new_tokens is None Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding legacy arg retrocompatibility * seq2seq trainer: evaluate and predict untouched * Apply suggestions from code review Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding init args, keeping IDEs hints --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Vladislav Sokolovskii authored
* Wav2Vec2ProcessorWithLM can return N best hypotheses now Signed-off-by:
Vladislav Sokolovskii <vladislav@parrothq.com> * Wav2Vec2ProcessorWithLM n_best cannot be None Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Batch decoding can return N best hypotheses now batch_decode was extended with the same functionality as decode function, N best hypotheses per sample can be returned Signed-off-by:
Vladislav Sokolovskii <vladislav@parrothq.com> --------- Signed-off-by:
Vladislav Sokolovskii <vladislav@parrothq.com> Co-authored-by:
Vladislav Sokolovskii <vladislav@parrothq.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
кѳѳsнī authored
balanced 8bit memory
-
Sylvain Gugger authored
-
Nicola Procopio authored
* updated toctree * added and translated mdx documents
-
Charlie-Bell authored
Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.
-
Joao Gante authored
* missing cmake * more cmake
-
- 24 Mar, 2023 7 commits
-
-
Stas Bekman authored
* [safetensors] don't use in pt<1.10 * better fix
-
Sylvain Gugger authored
-
Stas Bekman authored
-
Shubhamai authored
* [WIP] flax resnet * added pretrained flax models, results reproducible * Added pretrained flax models, results reproducible * working on tests * no real code change, just some comments * [flax] adding support for batch norm layers * fixing bugs related to pt+flax integration * removing loss from modeling flax output class * fixing classifier tests * fixing comments, model output * cleaning comments * review changes * review changes * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * renaming Flax to PyTorch --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Joao Gante authored
-
Samuel Bubán authored
* Improve error message * Fix consistency
-
Sylvain Gugger authored
* Pin tensorflow-text to go with tensorflow * Make it more convenient to pin TensorFlow * setup don't like f-strings
-