- 07 Aug, 2024 7 commits
-
-
Joao Gante authored
* logits * words
-
Jonathan Rahn authored
`https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline.__call__` `generate_kwargs (dict, optional) — Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).` link in "here" doesnt work
-
Aymeric Roucher authored
* Allow optional use of grammars to constrain generation
-
Bill Zhou authored
-
append-only authored
* enable xla fsdp * add acceleration version check for xla fsdp
-
Raushan Turganbay authored
* gemma2 fallback to dynamic cache * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * raise error and dont fallback to dynamic cache * prev will break most forward calls/tests * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update * fix copies --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Raushan Turganbay authored
* draft bart with new cache * add cache for decoder-only models * revert utils * modify docstring * revert bart * minor fixes * fix copies (not related) * revert tests * remove enc-dec related code * remove bloom * remove opt (enc-dec) * update docstring * git, codegen, gpt_neo, gpt_neox, gpj * clean up * copied from statements * revert * tmp * update warning msg * forgot git * add more flags * run-slow git,codegen,gpt_neo,gpt_neox,gpj * add cache flag to VLMs * remove files * style * video LLMs also need a flag * style * llava will go in another PR * style * [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics * Update src/transformers/models/gpt_neo/modeling_gpt_neo.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * copy from * deprecate until v4.45 and warn if not training * nit * fix test * test static cache * add more tests and fix models * fix copies * return sliding window mask * run slow tests & fix + codestyle * one more falcon fix for alibi --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 06 Aug, 2024 17 commits
-
-
HyunJi Shin authored
* docs: ko: tasks/image_to_image.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by:
Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by:
Jiwook Han <33192762+mreraser@users.noreply.github.com> * fix: handle remaining suggestions Co-authored-by:
Jiwook Han <33192762+mreraser@users.noreply.github.com> --------- Co-authored-by:
Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by:
Jiwook Han <33192762+mreraser@users.noreply.github.com>
-
boyunJang authored
* docs: ko: tasks/idefics.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by:
Chaewon Song <chaewon1019@ewhain.net> Co-authored-by:
Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by:
timdalxx <48753785+jeongiin@users.noreply.github.com> --------- Co-authored-by:
Chaewon Song <chaewon1019@ewhain.net> Co-authored-by:
Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by:
timdalxx <48753785+jeongiin@users.noreply.github.com>
-
timdalxx authored
* docs: ko: tasks/mask_generation.md * feat: nmt draft * fix : toc local * fix : manual edits * fix : ko-toctree * fix: resolve suggestions Co-authored-by:
boyunJang <gobook1234@naver.com> Co-authored-by:
Chaewon Song <chaewon1019@ewhain.net> * fix: resolve suggestions Co-authored-by:
boyunJang <gobook1234@naver.com> Co-authored-by:
Chaewon Song <chaewon1019@ewhain.net> * fix: resolve suggestions * fix: resolve suggestions * fix: resolve suggestions --------- Co-authored-by:
boyunJang <gobook1234@naver.com> Co-authored-by:
Chaewon Song <chaewon1019@ewhain.net>
-
Matthew Douglas authored
Revert "fixes to properly shard FSDP across cpu and meta for cpu_effcient_loading for prequantized 4bit (#32276)" (#32477) * Revert "fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276)" This reverts commit 62c60a30 . We uncovered an issue with this change that caused our training runs to hang. * `is_torchdynamo_compiling` -- cast a wide exception net (#32476) * cast a wide net * make fix-copies with a few manual changes * add copied from --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
Joao Gante authored
* cast a wide net * make fix-copies with a few manual changes * add copied from
-
Arthur Zucker authored
-
Chris Toukmaji authored
Update nllb.md
-
Zach Mueller authored
* Migrate import checks to secondary accelerate calls * better errs too * Revert, just keep the import checks + remove accelerate-specific things * Rm extra' * Empty commit for ci * Small nits * Final
-
Pablo Montalvo authored
* add new model like * draft cuda forward - mismatched keys (sharding on conv1) * match keys successfully * fix split * get generation/forward running (wrong gens, norm?) * :update * some refactoring * fixes * works up until copy to cache * fix * update * NON WORKING VERSION * version that work? * nit * fix config * fix conversion script * working cuda forward * nit * update * simplifcation * make mamba slow simple work * no einops * todo * fix style * no einops * update fix no einsum * nit * remove einops * bug: scan_output differs strongly * add rms norm option * fix fast + slow generation with and w/o cache
✔ * draft integration tests * remove a big chunk of the einsum * fix slow, fast generations, without any einsum * fix copies * fix structure * fix up modeling and tests * fix tests * clamping is indeed worse * recover mamba2 cache test * fix copies * no cache position (yet) * fix tf tests * fix matmul for generate * fixup * skip cache tests for now * [run-slow]mamba2 * tune out hidden states for padding * test batched generation * propagate attention mask changes * fix past length * fix integration test * style * address comments * update readme * add mamba2 version check * fix tests * [run-slow]mamba2 * skip edge tests * [run-slow]mamba2 * last fixup * [run-slow]mamba2 * update README --------- Co-authored-by:Arthur Zucker <arthur.zucker@gmail.com>
-
Joao Gante authored
-
Ao Tang authored
* Add nemotron support * fix inference * add unit test * add layernorm1p as a class to avoid meta device mismatch * test fixed * Add copied_from statements * remove pretraining_tp args * remove nemotronlayernorm * force LN computation done in FP32 * remove nemotrontokenizer and use llamatokenizer * license update * add option for kv_channels for minitron8b * remove assert * o_proj fixed * o_proj reshape * add gated_proj option * typo * remove todos * fix broken test after merging latest main * remove nezha/nat after meging main * chnage default config to 15b model * add nemo conversion script * rename conversion script * remove gate_proj option * pr comment resolved * fix unit test * rename kv_channels to head_dim * resolve PR issue * add nemotron md * fix broken tests * refactor rope for nemotron * test fix * remove linearscaling * whitespace and import * fix some copied-from * code style fix * reformatted * add position_embedding to nemotronattention * rope refactor to only use config, copied-from fix * format * Run make fix-copies * nemotron md with autodoc * doc fix * fix order * pass check_config_docstrings.py * fix config_attributes * remove all llama BC related code * Use PreTrainedTokenizerFast * ruff check examples * conversion script update * add nemotron to toctree
-
Joao Gante authored
deps_2
-
Francisco Kurucz authored
-
Pavel Iakubovskii authored
* BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests
-
Fanli Lin authored
* add flash attention check * fix * fix * add the missing marker * bug fix * add one more * remove order * add one more
-
Prakarsh Kaushik authored
fix: add new llava like model bug
-
Raushan Turganbay authored
* draft * updates * works? * try adding python example in hidden section * another try * hwo do i render python * format as html code? * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * one more small update * should render hidden secrtion now * add outputs * fix links * check links * update all links * update with offloaded cache * all cache is importable, so they appear in docs * fix copies * docstring... --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
- 05 Aug, 2024 10 commits
-
-
Francisco Kurucz authored
-
amyeroberts authored
* Respect the config's attn if set * Update test - can override in from_config * Fix
-
Sai-Suraj-27 authored
Fixed tokenizertests for luke, mluke models.
-
Abdi authored
* fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize
-
Francisco Kurucz authored
-
Nicholas Broad authored
I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.
-
Ita Zaporozhets authored
* save total_vocab_size = vocab_size + user added tokens to speed up operation * updating length when added_tokens_decoder is set * add test len(tokenizer)
-
Raushan Turganbay authored
fix phi
-
TechInterMezzo authored
* fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction
-
dependabot[bot] authored
Bump keras in /examples/research_projects/decision_transformer Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1. - [Release notes](https://github.com/keras-team/keras/releases) - [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1 ) --- updated-dependencies: - dependency-name: keras dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 03 Aug, 2024 2 commits
-
-
Xueshen Liu authored
MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500) * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe) * fix typo [:-1] to [:, -1] * to meet formatting requirement * to meet formatting requirement * remove white space * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue. * propagate to startcoder2, phi3, mixtral and qwen2 * update qwen2_moe
-
Shaopeng Fu authored
fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157) fix: Exception raised when running .
-
- 02 Aug, 2024 3 commits
-
-
Sanchit Gandhi authored
* up * style * stopping
-
Joao Gante authored
tests! :D
-
Raushan Turganbay authored
nits
-
- 01 Aug, 2024 1 commit
-
-
Zach Mueller authored
* Test this zach * Test for improper init w/o zero3 * Move back * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Get rid of stars in warning * Make private * Make clear --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-