- 23 Jul, 2024 1 commit
-
-
Joao Gante authored
Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 14 Jul, 2024 1 commit
-
-
Joao Gante authored
* tmp commit * shorter * nit * explicit kwargs * propagate changes * mass propagation with a few manual touches (let's see how CI behaves) * fix cacheless case * Update src/transformers/generation/utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * make fixup --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 11 Jul, 2024 1 commit
-
-
Arthur authored
* dumb commit * nit * update * something like this * unpack in modeling utils * safe import * oups * update * nits * diff convert gemma * update * start propagating * udpate other modeling code as well * update for sliding window models * nits * more init cleanups * styling * fixup * noice * pass fixup * typo typing_extension -> typing_extensions * torch.nn.functionnal -> torch.nn.functional * add to import structure * unpack * simplify a bit more for this first version * nut * update * update * nit * ease the import of `Unpack` * remove useless `use_sliding_window` * no qua please * protect import? * style * [run-slow] * [run slow] llama,gemma,mistral,mixtral * remove extra kwargs * fix llama * address review comments * apply diff_model_converter to modeling_gemma.py * remove cache_position 1 * remove cache_position 2 * some cleaning * refactor gemma2 as well * apply review comments * rename file to modeling_flash_attention_utils.py * siglip refactor * remove dead code * is the hub down? * still down? * fix siglip * fix gemma2 * fatal: Could not read from remote repository. * fix typo in softcap implem * flacky * Failed: Timeout >120.0s --------- Co-authored-by:fxmarty <9808326+fxmarty@users.noreply.github.com>
-
- 07 Jun, 2024 1 commit
-
-
Cyril Vallez authored
* Fix jetmoe model * Remove skip-tests
-
- 23 May, 2024 1 commit
-
-
Benjamin Warner authored
add torch.compile dynamic support
-
- 16 May, 2024 1 commit
-
-
Yih-Dar authored
* fix * [run-slow] gemma * add test * add `test_compile_static_cache` * fix * style * remove subprocess * use attribute * fix * style * update * [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 15 May, 2024 1 commit
-
-
Edoardo Cetin authored
* Fix llama model forward function with attention=True, same-length encoded sequence. * Fix style * propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama) * Fix style * ignore unnecessary sdpa mask converter when output_attentions=True * add tests checking sdpa and eager outputs match when output_attentions=True * Split if statements in two lines Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix formatting * Add fix to new jetmoe model * Add missing output_attentions argument to jetmoe mask creation --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 14 May, 2024 1 commit
-
-
Yikang Shen authored
* init jetmoe code * update archive maps * remove flax import * fix import error * update README * ruff fix * update readme * fix * update config * fix issue * merge files * fix model bug * fix test * auto fix * model size * add comments * fix form * add flash attention support * fix attention head number * fix init * fix support list * sort auto mapping * fix test * fix docs * update test * fix test * fix test * change variable name * fix config * fix init * update format * clean code * fix config * fix config * change default config * update config * fix issues * update formate * update config argument * update format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * change to mixtral aux loss * change to cache_position * debug * fix bugs * debug * fix format * fix format * fix copy * fix format * fix format * fix sort * fix sort * fix sort * add copy comment * add copy from * remove debug code * revert readme update * add copy * debug * remove debug code * fix flash attention * add comments * clean code * clean format * fix format * fix format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * change variable name * add copied from * fix variable name * remove deprecated functinos * sync to llama implementation * fix format * fix copy * fix format * update format * remove repr * add comment for moe weight * fix copy * Update src/transformers/models/jetmoe/configuration_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * add comments and reformat config * fix format * fix format * fix format * update test * update doc string in config * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update config doc * update attention cache * fix format * fix copy --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-
- 13 May, 2024 1 commit
-
-
Poedator authored
* 4d mask fixes * Update custom 4D mask logic * test moved to mixin * extra tests 4d mask * upd 4d mask and StaticCache handling * added Mask4DTestHard to mistral tests * post-rebase fixes * test fixes for StaticCache * make fix-copies * upd 1 after #30476 * fix common tests * rm elif attention_mask.dim() == 4: * tests combined, fixed, mixtral supported * bigbird style chg reverted * rm if attention_mask.dim() == 2 * modeling_llama formatting chg --------- Co-authored-by:Joao Gante <joao@huggingface.co>
-
- 09 May, 2024 1 commit
-
-
Raushan Turganbay authored
kv_cache is no longer a model attribute
-
- 08 May, 2024 1 commit
-
-
Joao Gante authored
-
- 03 May, 2024 1 commit
-
-
Mayank Mishra authored
* add bias * fix quality
-
- 02 May, 2024 1 commit
-
-
Michael Benayoun authored
-
- 30 Apr, 2024 1 commit
-
-
Joao Gante authored
-
- 29 Apr, 2024 1 commit
-
-
Benjamin Warner authored
* Reenable SDPA's FA2 during training with torch.compile * fix Olmo's SDPA FA2 dispatching too * update formatting * improved SDPA comment * formatting and explanatory comment * is_causal if statement to one-liner
-
- 22 Apr, 2024 1 commit
-
-
Arthur authored
* nit to make sure cache positions are not sliced * fix other models * nit * style
-
- 18 Apr, 2024 2 commits
-
-
Younes Belkada authored
FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments
-
- 17 Apr, 2024 1 commit
-
-
fxmarty authored
* tentatively re-enable FA2 + SDPA * better comment * _ignore_causal_mask_sdpa as staticmethod * type hints * use past_seen_tokens instead * enable copied from for sdpa * ruff * llama simplifications on review * remove unnecessary self.is_causal check * fix copies * cleaning * precise message * better doc * add test * simplify * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 05 Apr, 2024 1 commit
-
-
Michael Benayoun authored
* [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * Apply changes to other models
-
- 30 Mar, 2024 1 commit
-
-
TechxGenus authored
fix awq quant
-
- 28 Mar, 2024 1 commit
-
-
Arthur authored
* fi xbc? * nit
-
- 21 Mar, 2024 1 commit
-
-
Joao Gante authored
* always convert the mask * rebase and fix copies
-
- 20 Mar, 2024 1 commit
-
-
Arthur authored
* attempt to fix * the actual fix that works with compilation! * this? * temporary update * nit? * dispatcg to memory efficient? * update both models that have static cache support * fix copies fix compile * make sure fix * fix cohere and gemma * fix beams? * nit * slipped through the cracks * nit * nits * update * fix-copies * skip failing tests * nits
-
- 19 Mar, 2024 1 commit
-
-
Joao Gante authored
* partial 4d masks * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 14 Mar, 2024 1 commit
-
-
Joao Gante authored
-
- 13 Mar, 2024 1 commit
-
-
Joao Gante authored
-
- 08 Mar, 2024 1 commit
-
-
liangjs authored
* fix stablelm dropout argument type error * fix docs of _flash_attention_forward * fix all docs of _flash_attention_forward * fix docs of _flash_attention_forward in starcoder2 --------- Co-authored-by:oliang <oliang@tencent.com>
-
- 07 Mar, 2024 1 commit
-
-
Joao Gante authored
-
- 06 Mar, 2024 2 commits
-
-
Park Jun authored
* Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS devices * Update src/transformers/models/gemma/modeling_gemma.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update llama ang gemma rope use cpu in mps device --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Glen Taggart authored
Substantially reduce memory usage in _update_causal_mask for large batches by using .expand instead of .repeat [needs tests+sanity check] (#29413) * try to fix gemma mem use * fix: handle attention mask dim==2 case * remove logits=logits.float() * clean up + add llama * apply formatting * readability edit: swap order of items being multiplied * revert change unrelated to PR * revert black autoformat * switch to one .to * Accept style edits Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 01 Mar, 2024 2 commits
-
-
Arthur authored
* use the generation config 馃珷 * fixup
-
Leon Engl盲nder authored
* LlamaForQuestionAnswering self.transformer->self.model * fix "Copied from" string * Llama QA model: set base_model_prefix = "transformer"
-
- 28 Feb, 2024 4 commits
-
-
fxmarty authored
* better unmask imple * comment * typo * bug report pytorch * cleanup * fix import * add back example * retrigger ci * come on
-
jiqing-feng authored
Co-authored-by:Joao Gante <joao@huggingface.co>
-
Daniel Han authored
* Update modeling_llama.py Llama - Force float32 since bfloat16 loses precision on long contexts * Update modeling_llama.py * Update modeling_gemma.py Fix RoPE and logits.float() * @torch.no_grad() * @torch.no_grad() * Cos, Sin to float32 * cos, sin to float32 * Update src/transformers/models/gemma/modeling_gemma.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Resolve PR conflicts * Fix RoPE for llama * Revert "Fix RoPE for llama" This reverts commit b860a22dab9bb01cd15cb9a3220abeaefad3e458. * Fix RoPE for llama * RoPE device * Autocast device type * RoPE * RoPE isinstance --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Arthur authored
* remove control flow * update gptneox * update .... * nits * Actually let's just break. Otherwise we are silently failing which imo is not optimal * version BC * fix tests * fix eager causal * nit * add a test * style * nits * nits * more nits for the test * update and fix * make sure cuda graphs are not skipped * read token is needed for meta llama * update! * fiixup * compile test should be slow * fix thet fix copies * stle 馃珷
-
- 27 Feb, 2024 1 commit
-
-
Andrei Panferov authored
Cleaner Cache `dtype` and `device` extraction for CUDA graph generation for quantizers compatibility (#29079) * input_layernorm as the beacon of hope * cleaner dtype extraction * AQLM + CUDA graph test * is available check * shorter text test
-
- 26 Feb, 2024 1 commit
-
-
fxmarty authored
use torch.bool instead of torch.int64
-
- 23 Feb, 2024 1 commit
-
-
Alessandro Palla authored
* Fix issue 29206 * Fix style
-