- 05 Jun, 2024 1 commit
-
-
Cyril Vallez authored
* Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0)) * Fix _contrastive_search for non-standard cache using ellipsis slicing * Fix all outputs.logits memory leaks for all decoding strategies! * Fix small error in _contrastive_search() * Make all necessary change and revert for the new class * Apply coding style * Remove pipes in type hints for compatibility * correct type hint * apply style * Use DynamicCache by default and solve conflicts * Fix rebase issues * Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models * Create generation config to return legacy format by default, or to choose not to * style * Fix case when use_cache is False * Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache * Update prepare_inputs_for_generation() for case with empty DynamicCache * Correct return of args in _assisted_decoding * Remove EfficientDynamicCache as it is no longer needed * Correct mistake in generation config * Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__ * change DynamicCache function names from "split" to "batch_split" for readability + apply coding style * Remove `_supports_dynamic_cache_class` attribute after rebase * Correct missing line lost in conflict resolution during rebasing * Add special case for Jamba * Fix jamba test * Coding style * coding style * Correct missing import in rebasing * Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute * Simplify code paths in _contrastive_search * coding style * Update docstrings of cache methods * Update prepare_inputs_for_generation() -> past_key_values are always Cache objects
-
- 03 Jun, 2024 1 commit
-
-
Arthur authored
* fixes * fix-copies
-
- 31 May, 2024 1 commit
-
-
Arthur authored
* current working example! * commit regex and result file * update * nit * push the conversion file * oups * roadmap and nits * attempt diffs for 3 files * persimmon * nit * add diff file that is the same as the modeling_llama.py * fix rope nits * updates * updates with converted versions * give some breathing space to the code * delete * update * update * push the actual result * update regex patterns * update regex patterns * fix some issues * fix some issues * fix some issues * updates * updates * updates * updates * updates * revert changes done to llama * updates * update gemma * updates * oups * current state * current state * update * ouiiii * nit * clear diffs * nit * fixup * update * doc
馃殌 *馃敟 * for now use gemma * deal with comments * style * handle funtions * deal with assigns * todos * process inheritage * keep decorators? *馃 * deal with duplicates * fixup * correctly remove duplicate code * run ruff post script * ruff deals pretty well with imports, let's leave it to him * ah maybe not lol * for now remove all imports from child. * nit * conversion of llama * okay * convert starcoder2 * synch with main * update llama diff * updates * https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff * updates * okay actual state * non zero exit * update! * revert unrelated * remove other diff files * updates * cleanup * update * less diff! * stash * current updates * updates * No need for call * finished fining deps * update * current changes * current state * current state * new status * nit * finally * fixes * nits * order is now expected * use logger info instead of prints * fixup * up * nit * update * nits * update * correct merge * update * update * update * add warning * update caution message * update * better merging strategy * copy class statements :wink * fixups * nits * update * Apply suggestions from code review Co-authored-by:amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits * smaller header * do cleanup some stuff * even simpler header? * fixup * updates * ruff * update examples * nit * TODO * state * OUUUUUUF * current state * nits * final state * add a readme * fixup * remove diff llama * fix * nit * dummy noy funny * ruff format tests src utils --check * everless diffs * less diffs and fix test * fixes * naming nit? * update converter and add supper example * nits * updated for function signatures * update * update * add converted dummies * autoformat * single target assign fix * fixup * fix some imports * fixes * don't push them * `# noqa: F841` --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 23 May, 2024 1 commit
-
-
Raushan Turganbay authored
* clean-up * Update src/transformers/cache_utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/cache_utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/cache_utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * more suggestions * mapping if torch available * run tests & add 'support_quantized' flag * fix jamba test * revert, will be fixed by another PR * codestyle * HQQ and versatile cache classes * final update * typo * make tests happy --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-
- 22 May, 2024 1 commit
-
-
Arthur authored
* update ruff version * fix research projects * Empty * Fix errors --------- Co-authored-by:Lysandre <lysandre@huggingface.co>
-
- 20 May, 2024 3 commits
-
-
Longjie Zheng authored
* first version * fix sliding window * fix style * add sliding window cache * fix style * address comments * fix test * fix style * move sliding window check inside cache init * revert changes on irrelevant files & add comment on SlidingWindowCache * address comments & fix style fix style * update causal mask * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] llama * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * revert CI from a10 to t4 * wrap up
-
Benjamin Warner authored
* add torch.compile dynamic support * Add SDPA dynamic shapes compile test & improve SDPA comment * comment consistency
-
Joseph Enguehard authored
* Add MistralForTokenClassification * Add tests and docs * Add token classification for Mixtral and Qwen2 * Save llma for token classification draft * Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2 * Formatting * Add token classification support for Qwen2Moe model * Add dropout layer to each ForTokenClassification model * Add copied from in tests * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Propagate suggested changes * Style --------- Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-
- 17 May, 2024 1 commit
-
-
amyeroberts authored
* Remove deprecated logic and warnings * Add back some code that seems to be important... * Let's just add all he nllb stuff back; removing it is a bit more involved * Remove kwargs * Remove more kwargs
-
- 16 May, 2024 2 commits
-
-
Yih-Dar authored
* fix * [run-slow] gemma * add test * add `test_compile_static_cache` * fix * style * remove subprocess * use attribute * fix * style * update * [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Joao Gante authored
* jamba cache * new flag * generate exception
-
- 15 May, 2024 1 commit
-
-
Edoardo Cetin authored
* Fix llama model forward function with attention=True, same-length encoded sequence. * Fix style * propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama) * Fix style * ignore unnecessary sdpa mask converter when output_attentions=True * add tests checking sdpa and eager outputs match when output_attentions=True * Split if statements in two lines Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix formatting * Add fix to new jetmoe model * Add missing output_attentions argument to jetmoe mask creation --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 14 May, 2024 1 commit
-
-
Pablo Montalvo authored
* add new model like * add state dict slicing + new model config * update palma config and weights, passes vision activations * fix * update * reorder loading/unpacking * clean up * add debug statements * change device * fix * debugging * fix noncausal mask * fixup sdpa + causal mask * fix activation function * remove debug before changing modeling file * add variants * debug attention mask in generate * revert to non-debug sdpa * revert gemma modifications * add custom language modeling * use Processor * add language modeling file to init * try thin wrapper around generate * Update * update mask * breakpoints galore * remove conflict * switch to left-padding * add incomplete model doc * add paligemma global files * batch rename paligemma * make generation match outputs and captioning * style * style * remove copied from + doc * remove more copied from * remove copy from projector * minor fix * update config and style * add readme - dummy * CORRECT image captioning * moving to args * add siglip proper + fix merging image + text features * take update_causal_mask from upstream * remove breakpoint * leverage AutoModel * fix input_ids slicing * make siglip head conditional * remove encoder_decoder value * remove unneeded modeling file * add commented 4d attention mask * FIXED generation with 4D mask * Update src/transformers/models/siglip/modeling_siglip.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix left padding detection * shuffle order of verifications * fix missing labels for training * fix * vectorize merging of features, improve slicing * improve testing before conversion * handle merging in processor * image token index depends on checkpoint * add variants, save processor too * save processors, base tokenizer off spm file * expand model embeddings due to additional image token * pass image processing args * add convert rgb to siglip processor * add \n token separately * fix tokenizer and prompts * fix docstrings * change to camel * fix casing * debug pos_ids and sdpa * pass and use cache_position * add flag for newline tokenization * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by:
Merve Noyan <merveenoyan@gmail.com> * simplify conversion script * add copied from * add precision to conversion script * Update src/transformers/models/paligemma/modeling_paligemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * clean up * Shift attention mask from `1:` After discussion with @molbap * add docs, fix quality * quality, tied weights inheritance, and logits/label alignment * fix more tests * pass attn_implementation to language model correctly * add SiglipVisionTransformer to no split modules * skip paligemma test for sdpa dispatch to flash * skip incompatible tests * quality * [broken archive maps] * Apply suggestions - remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/dummy_pt_objects.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * simplify conversion script * add suggestions * add suggestions * add copied from * fix * move labels out * revert * fix * remove placeholder labels if None * use cache_position * fix quality + docstrings * fix quality * fix paligemma 4d gemma mask incompatibility * fix config docstring * fix query and attn_mask dtype --------- Co-authored-by:
ArthurZucker <arthur.zucker@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Merve Noyan <merveenoyan@gmail.com> Co-authored-by:
Pedro Cuenca <pedro@huggingface.co>
-
- 13 May, 2024 1 commit
-
-
Poedator authored
* 4d mask fixes * Update custom 4D mask logic * test moved to mixin * extra tests 4d mask * upd 4d mask and StaticCache handling * added Mask4DTestHard to mistral tests * post-rebase fixes * test fixes for StaticCache * make fix-copies * upd 1 after #30476 * fix common tests * rm elif attention_mask.dim() == 4: * tests combined, fixed, mixtral supported * bigbird style chg reverted * rm if attention_mask.dim() == 2 * modeling_llama formatting chg --------- Co-authored-by:Joao Gante <joao@huggingface.co>
-
- 09 May, 2024 1 commit
-
-
Raushan Turganbay authored
kv_cache is no longer a model attribute
-
- 08 May, 2024 1 commit
-
-
Joao Gante authored
-
- 02 May, 2024 1 commit
-
-
Michael Benayoun authored
-
- 01 May, 2024 1 commit
-
-
Pedro Cuenca authored
* Gemma: only display act. warning when necessary This is a nit PR, but I was confused. I got the warning even after I had changed `hidden_act` to `gelu_pytorch_tanh`, telling me that I was using the "legacy" `gelu_pytorch_tanh`. Another option is to keep the warning but change the message to say something like "`hidden_act` is ignored, please use `hidden_activation` instead. Setting Gemma's activation function to `gelu_pytorch_tanh`". * Change message, and set `config.hidden_activation`
-
- 30 Apr, 2024 1 commit
-
-
Joao Gante authored
-
- 29 Apr, 2024 1 commit
-
-
Benjamin Warner authored
* Reenable SDPA's FA2 during training with torch.compile * fix Olmo's SDPA FA2 dispatching too * update formatting * improved SDPA comment * formatting and explanatory comment * is_causal if statement to one-liner
-
- 22 Apr, 2024 1 commit
-
-
Arthur authored
* nit to make sure cache positions are not sliced * fix other models * nit * style
-
- 18 Apr, 2024 2 commits
-
-
Younes Belkada authored
FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments
-
- 17 Apr, 2024 1 commit
-
-
fxmarty authored
* tentatively re-enable FA2 + SDPA * better comment * _ignore_causal_mask_sdpa as staticmethod * type hints * use past_seen_tokens instead * enable copied from for sdpa * ruff * llama simplifications on review * remove unnecessary self.is_causal check * fix copies * cleaning * precise message * better doc * add test * simplify * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 05 Apr, 2024 1 commit
-
-
Michael Benayoun authored
* [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * Apply changes to other models
-
- 30 Mar, 2024 1 commit
-
-
TechxGenus authored
fix awq quant
-
- 28 Mar, 2024 1 commit
-
-
Arthur authored
* fi xbc? * nit
-
- 22 Mar, 2024 1 commit
-
-
Arthur authored
nit
-
- 21 Mar, 2024 1 commit
-
-
Joao Gante authored
* always convert the mask * rebase and fix copies
-
- 20 Mar, 2024 1 commit
-
-
Arthur authored
* attempt to fix * the actual fix that works with compilation! * this? * temporary update * nit? * dispatcg to memory efficient? * update both models that have static cache support * fix copies fix compile * make sure fix * fix cohere and gemma * fix beams? * nit * slipped through the cracks * nit * nits * update * fix-copies * skip failing tests * nits
-
- 19 Mar, 2024 2 commits
-
-
Joao Gante authored
* partial 4d masks * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Arthur authored
* gelu_pytorch_tanh * Force config.hidden_act to be approx gelu * Gemma bug fixes * force_use_exact_gelu * Update configuration_gemma.py * Update modeling_gemma.py * update * update for simpler handling * nit * nit * fixpup * update * also update the jax modeling! * add `"gelu_pytorch_tanh": partial(nn.gelu, approximate=True),` * fixup * fix order * act vs act_fn --------- Co-authored-by:Daniel Han <danielhanchen@gmail.com>
-
- 14 Mar, 2024 1 commit
-
-
Joao Gante authored
-
- 13 Mar, 2024 1 commit
-
-
Joao Gante authored
-
- 08 Mar, 2024 1 commit
-
-
liangjs authored
* fix stablelm dropout argument type error * fix docs of _flash_attention_forward * fix all docs of _flash_attention_forward * fix docs of _flash_attention_forward in starcoder2 --------- Co-authored-by:oliang <oliang@tencent.com>
-
- 06 Mar, 2024 2 commits
-
-
Park Jun authored
* Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS devices * Update src/transformers/models/gemma/modeling_gemma.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update llama ang gemma rope use cpu in mps device --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Glen Taggart authored
Substantially reduce memory usage in _update_causal_mask for large batches by using .expand instead of .repeat [needs tests+sanity check] (#29413) * try to fix gemma mem use * fix: handle attention mask dim==2 case * remove logits=logits.float() * clean up + add llama * apply formatting * readability edit: swap order of items being multiplied * revert change unrelated to PR * revert black autoformat * switch to one .to * Accept style edits Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 01 Mar, 2024 1 commit
-
-
Arthur authored
* use the generation config 馃珷 * fixup
-
- 28 Feb, 2024 2 commits
-
-
fxmarty authored
* better unmask imple * comment * typo * bug report pytorch * cleanup * fix import * add back example * retrigger ci * come on
-
jiqing-feng authored
Co-authored-by:Joao Gante <joao@huggingface.co>
-