- 23 Jul, 2024 1 commit
-
-
Joao Gante authored
Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 14 Jul, 2024 1 commit
-
-
Joao Gante authored
* tmp commit * shorter * nit * explicit kwargs * propagate changes * mass propagation with a few manual touches (let's see how CI behaves) * fix cacheless case * Update src/transformers/generation/utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * make fixup --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 11 Jul, 2024 1 commit
-
-
Arthur authored
* dumb commit * nit * update * something like this * unpack in modeling utils * safe import * oups * update * nits * diff convert gemma * update * start propagating * udpate other modeling code as well * update for sliding window models * nits * more init cleanups * styling * fixup * noice * pass fixup * typo typing_extension -> typing_extensions * torch.nn.functionnal -> torch.nn.functional * add to import structure * unpack * simplify a bit more for this first version * nut * update * update * nit * ease the import of `Unpack` * remove useless `use_sliding_window` * no qua please * protect import? * style * [run-slow] * [run slow] llama,gemma,mistral,mixtral * remove extra kwargs * fix llama * address review comments * apply diff_model_converter to modeling_gemma.py * remove cache_position 1 * remove cache_position 2 * some cleaning * refactor gemma2 as well * apply review comments * rename file to modeling_flash_attention_utils.py * siglip refactor * remove dead code * is the hub down? * still down? * fix siglip * fix gemma2 * fatal: Could not read from remote repository. * fix typo in softcap implem * flacky * Failed: Timeout >120.0s --------- Co-authored-by:fxmarty <9808326+fxmarty@users.noreply.github.com>
-
- 07 Jun, 2024 1 commit
-
-
Cyril Vallez authored
* Fix jetmoe model * Remove skip-tests
-
- 23 May, 2024 1 commit
-
-
Benjamin Warner authored
add torch.compile dynamic support
-
- 16 May, 2024 1 commit
-
-
Yih-Dar authored
* fix * [run-slow] gemma * add test * add `test_compile_static_cache` * fix * style * remove subprocess * use attribute * fix * style * update * [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 15 May, 2024 1 commit
-
-
Edoardo Cetin authored
* Fix llama model forward function with attention=True, same-length encoded sequence. * Fix style * propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama) * Fix style * ignore unnecessary sdpa mask converter when output_attentions=True * add tests checking sdpa and eager outputs match when output_attentions=True * Split if statements in two lines Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix formatting * Add fix to new jetmoe model * Add missing output_attentions argument to jetmoe mask creation --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 14 May, 2024 1 commit
-
-
Yikang Shen authored
* init jetmoe code * update archive maps * remove flax import * fix import error * update README * ruff fix * update readme * fix * update config * fix issue * merge files * fix model bug * fix test * auto fix * model size * add comments * fix form * add flash attention support * fix attention head number * fix init * fix support list * sort auto mapping * fix test * fix docs * update test * fix test * fix test * change variable name * fix config * fix init * update format * clean code * fix config * fix config * change default config * update config * fix issues * update formate * update config argument * update format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * change to mixtral aux loss * change to cache_position * debug * fix bugs * debug * fix format * fix format * fix copy * fix format * fix format * fix sort * fix sort * fix sort * add copy comment * add copy from * remove debug code * revert readme update * add copy * debug * remove debug code * fix flash attention * add comments * clean code * clean format * fix format * fix format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * change variable name * add copied from * fix variable name * remove deprecated functinos * sync to llama implementation * fix format * fix copy * fix format * update format * remove repr * add comment for moe weight * fix copy * Update src/transformers/models/jetmoe/configuration_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * add comments and reformat config * fix format * fix format * fix format * update test * update doc string in config * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update config doc * update attention cache * fix format * fix copy --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-