Commits · e96c1de1913c307fddcb3e5881388a6dbb5b00b1 · chenpangpang / transformers

11 Dec, 2023 3 commits

Skip `UnivNetModelTest::test_multi_gpu_data_parallel_forward` (#27912) · e96c1de1
Yih-Dar authored Dec 11, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
e96c1de1
[BEiT] Fix test (#27934) · 8d8970ef
NielsRogge authored Dec 11, 2023
```
Fix test
```
8d8970ef

[DETA] fix backbone freeze/unfreeze function (#27843) · 235be085

Sangbum Daniel Choi authored Dec 11, 2023



* [DETA] fix freeze/unfreeze function

* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add freeze/unfreeze test case in DETA

* fix type

* fix typo 2

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

235be085

09 Dec, 2023 3 commits

Fix typo (#27918) · df5c5c62
Brendan Fahy authored Dec 09, 2023

df5c5c62

[integration] Update Ray Tune integration for Ray 2.7 (#26499) · 5fa66df3

Justin Yu authored Dec 09, 2023



* fix tune integration for ray 2.7+
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* add version check for ray tune backend availability
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* missing import
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* pin min version instead
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* address comments
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* some fixes
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* fix unnecessary final checkpoint
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* fix lint
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* dep table fix
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* fix lint
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

---------
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

5fa66df3

[CLAP] Replace hard-coded batch size to enable dynamic ONNX export (#27790) · ffd426ee
Joshua Lochner authored Dec 09, 2023
```
* [CLAP] Replace hard-coded batch size to enable dynamic ONNX export

* Add back docstring
```
ffd426ee

08 Dec, 2023 18 commits

F.scaled_dot_product_attention support (#26572) · 80377eb0

fxmarty authored Dec 08, 2023



* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

80377eb0

Generate: SinkCache can handle iterative prompts (#27907) · ce0bbd51
Joao Gante authored Dec 08, 2023

ce0bbd51
fix typo in image_processing_blip.py Wwhether -> Whether (#27899) · 94c76538
zhc7 authored Dec 09, 2023

94c76538

[Doc] Spanish translation of pad_truncation.md (#27890) · d6c3a3f1

Aaron Jimenez authored Dec 08, 2023

* Add pad_truncation to es/_toctree.yml

* Add pad_truncation.md to es/

* Translated first two paragraph

* Translated paddig argument section

* Translated truncation argument section

* Translated final paragraphs

* Translated table

* Fixed typo in the table of en/pad_truncation.md

* Run make style | Fix a word

* Add Padding (relleno) y el Truncation (truncamiento) in the final paragraphs

* Fix relleno and truncamiento words

d6c3a3f1

Allow `resume_from_checkpoint` to handle `auto_find_batch_size` (#27568) · 6757ed28

Zach Mueller authored Dec 08, 2023



* Fuffill request

* Add test

* Better test

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Better test

* Better test

* MOre comments

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

6757ed28

fix llava (#27909) · aa7ab98e

Arthur authored Dec 08, 2023

* fix llava

* nits

* attention_mask was forgotten

* nice

* :)

* fixup

aa7ab98e

Llama conversion script: adjustments for Llama Guard (#27910) · e0b617d1
Pedro Cuenca authored Dec 08, 2023

e0b617d1

Fix 2 tests in `FillMaskPipelineTests` (#27889) · e3669375

Yih-Dar authored Dec 08, 2023



* fix

* fix

* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

e3669375

Fix `notification_service.py` (#27903) · 79e76559

Yih-Dar authored Dec 08, 2023



* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

79e76559

mark `test_initialization` as flaky in 2 model tests (#27906) · 3b720ad9
Yih-Dar authored Dec 08, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
3b720ad9
Fix CLAP converting script (#27153) · 7f07c356
Yoach Lacombe authored Dec 08, 2023
```
* update converting script

* make style
```
7f07c356

Fix remaining issues in beam score calculation (#27808) · b31905d1

Xin Qiu authored Dec 08, 2023

* Fix issues in add and is_done for BeamHypotheses

* make newly added arguments optional for better compatibility

* Directly use cur_len as generated_len, add note for retrocompatibility

* update test expectation

* make cur_len represents the length of the entire sequence including the decoder prompt

* remove redundant if/else in testing

b31905d1

Fix beam score calculation issue for Tensorflow version (#27814) · 3ac9945e

Xin Qiu authored Dec 08, 2023

* Fix beam score calculation issue for tensorflow version

* fix transition score computation error

* make cur_len represent the entire sequence length including decoder prompt

3ac9945e

fix: non-atomic checkpoint save (#27820) · 4c5ed1d0
Jonathon Belotti authored Dec 08, 2023

4c5ed1d0
Added passing parameters to "reduce_lr_on_plateau" scheduler (#27860) · fe8d1302
Charbel Abi Daher authored Dec 08, 2023

fe8d1302

Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty... · 56be5e80

Saibo-creator authored Dec 08, 2023


Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty set of tokens (#27797)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

56be5e80

[

⚠

️ removed a default argument] Make `AttentionMaskConverter` compatible with... · 307a7d0b

fxmarty authored Dec 08, 2023

[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868)

* remove bugged torch.float32 default

* add test

* fix tests

* fix test

* fix doc

307a7d0b

Generate: New `Cache` abstraction and Attention Sinks support (#26681) · 633215ba

Tom Aarsen authored Dec 08, 2023

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Implement the SinkCache through backward+forward rotations

* Integrate (Sink)Cache with Llama FA2

* Set use_legacy_cache=True as default, allows for test passes

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Remove copy utility from deprecated OpenLlama

* Match import style

* manual rebase with main

* Cache class working with generate (#1)

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453

) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Integrate (Sink)Cache with Llama FA2

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Match import style

* working generate

* Add tests; Simplify code; Apply changes to Mistral and Persimmon

* fix rebase mess

* a few more manual fixes

* last manual fix

* propagate changes to phi

* upgrade test

* add use_legacy_cache docstring; beef up tests

* reintroduce unwanted deletes

---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>

* move import

* add default to model_kwargs.get('use_legacy_cache')

* correct failing test

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply PR suggestions

* fix failing test

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>

* PR comments

* tmp commit

* add docstrings

* more tests, more docstrings, add to docs

* derp

* tmp commit

* tmp dbg

* more dbg

* fix beam search bug

* cache can be a list of tuples in some models

* fix group beam search

* all but sinkcache integration tests

* fix sink cache and add hard integration test

* now also compatible with input_embeds input

* PR comments

* add Cache support to Phi+FA2

* make fixup

---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

633215ba

07 Dec, 2023 16 commits

Translate `model_doc` files from `clip` to `cpm` to JP (#27774) · 0ea42ef0

Rockerz authored Dec 08, 2023



* Add models

* Add more models

* Update docs/source/ja/model_doc/convnextv2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/convbert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/codegen.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update translation errors and author names

* link update

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

0ea42ef0

Updates the distributed CPU training documentation to add instructions for... · 79b79ae2

Dina Suehiro Jones authored Dec 07, 2023

Updates the distributed CPU training documentation to add instructions for running on a Kubernetes cluster (#27780)

* Updates the Distributed CPU documentation to add a Kubernetes example

* Small edits

* Fixing link

* Adding missing new lines

* Minor edits

* Update to include Dockerfile snippet

* Add comment about tuning env var

* Updates based on review comments

79b79ae2

[docs] Custom semantic segmentation dataset (#27859) · f7595760
Steven Liu authored Dec 07, 2023
```
* custom dataset

* fix link

* feedback
```
f7595760
Generate: All logits processors are documented and have examples (#27796) · 58e7f9bb
Joao Gante authored Dec 07, 2023
```
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
```
58e7f9bb

Fix TF loading PT safetensors when weights are tied (#27490) · 47500b1d

Matt authored Dec 07, 2023



* Un-skip tests

* Add aliasing support to tf_to_pt_weight_rename

* Refactor tf-to-pt weight rename for simplicity

* Patch mobilebert

* Let us pray that the transfo-xl one works

* Add XGLM rename

* Expand the test to see if we can get more models to break

* Expand the test to see if we can get more models to break

* Fix MPNet (it was actually an unrelated bug)

* Fix MPNet (it was actually an unrelated bug)

* Add speech2text fix

* Update src/transformers/modeling_tf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update to always return a tuple from tf_to_pt_weight_rename

* reformat

* Add a couple of missing tuples

* Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway

* Revert changes to modeling_tf_mpnet.py

* Skip MPNet test and add explanation

* Add weight link for BART

* Add TODO to clean this up a bit

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

47500b1d

Show new failing tests in a more clear way in slack report (#27881) · 9f1f11a2
Yih-Dar authored Dec 07, 2023
```
* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
9f1f11a2
Fix device of masks in tests (#27887) · c99f2547
fxmarty authored Dec 07, 2023
```
fix device of mask in tests
```
c99f2547
update version of warning notification for `get_default_device` to v4.38 (#27848) · fc71e815
Hz, Ji authored Dec 07, 2023

fc71e815

update `create_model_card` to properly save peft details when using Trainer with PEFT (#27754) · 5324bf9c

Sourab Mangrulkar authored Dec 07, 2023



* update `create_model_card` to properly save peft details when using Trainer with PEFT

* nit

* Apply suggestions from code review
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

5324bf9c

Allow `# Ignore copy` (#27328) · 52746922

Yih-Dar authored Dec 07, 2023



* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

52746922

[`Llava`] Add Llava to transformers (#27662) · 44b5506d

Younes Belkada authored Dec 07, 2023

* add model like

* logits match

* minor fixes

* fixes

* up

* up

* add todo

* llava processor

* keep the processor simple

* add conversion script

* fixup

* fix copies

* up

* add to index

* fix config + logits

* fix

* refactor

* more refactor

* more refactor

* fix copies

* add authors

* v1 tests

* add `LlavaProcessor` in init

* remove unneeded import

* up

* up

* docs

* up

* fix CI

* fix CI

* add attention  mask in test

* make fixup

* remove the vision model

* that' s the dirty way to do it

* nits

* nits

* updates

* add more tests

* add input tests

* fixup

* more styling

* nits

* updates amd cleanup

* fixup the generation expected results

* fix the testing script

* some cleanup and simplification which does not work yet but almost there!

* make correct dispatch operations

* vectorize works for batch of images and text

* last todos

* nits

* update test and modeling code

* remove useless function for now

* fix few issues

* fix generation

* some nits

* add bakllava

* nits

* remove duplicated code

* finis merge

* cleanup

* missed this line

* fill the todos

* add left padding offset

* add left and rignt padding logic

* bool to properly index

* make sure

* more cleanups

* batch is fixed 😉



* add correct device for tensor creation

* fix some dtype missmatch

* ruff

* update conversion script

* Update src/transformers/__init__.py

* fa 2 support + fix conversion script

* more

* correct reshaping

* fix test dict

* fix copies by ignoring

* fix nit

* skip clip vision model

* fixup

* fixup

* LlavaForVisionText2Text -> LlavaForCausalLM

* update

* fix

* raise correct errors

* fix

* docs

* nuke for now

* nits here and there

* fixup

* fix remaining tests

* update LlavaForConditionalGeneration instead of CausalLM

* fixups

* pipeline support

* slow and piepline tests

* supports batch

* nits

* cleanup

* fix first integration tests

* add pad token where needed

* correct etsts

* fixups

* update pipeline testr

* fix quality

* nits

* revert unneeded change

* nit

* use BatchFeature

* from ...feature_extraction_utils import BatchFeature

* nits

* nits

* properly update

* more f*** nits

* fix copies

* comment

* keep slow test slow

* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add piepline example

* add pixel values in docstrign

* update pr doctest

* fix

* fix slow tests

* remove hack

* fixup

* small note

* forward contrib credits from PR25789

* forward contrib credits from original implementation and work

* add arthur

* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* update docstring

* nit

* move to not doctested because of timeout issues

* fixup

* add description

* more

* fix-copies

* fix docs

* add beam search

* add more comments

* add typehints on processor

* add speedup plot

* update slow tests and docs

* push test

* push batched test

* fix batched generation with different number of images

* remove benchmark due to a bug

* fix test

* fix copies

* add gcolab demo

---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

44b5506d

fix: fix gradient accumulate step for learning rate (#27667) · 0410a29a
Phuc Van Phan authored Dec 07, 2023

0410a29a

[`FA-2`] Add Flash Attention to `Phi` (#27661) · f84d85ba

Susnato Dhar authored Dec 07, 2023

* add FA and modify doc file

* test_flash_attn_2_generate_padding_right test overwritten

* comment

* modify persimmon modeling file

* added speedup graph

* more changes

f84d85ba

[i18n-fr] Translate autoclass tutorial to French (#27659) · 06f56168

Nolwenn Bernard authored Dec 07, 2023

* Translation of autoclass tutorial

* Update totree to keep only tutorial section

* Translate title toctree

* Fix typos

* Update review comments

06f56168

Fix bug of _prepare_4d_attention_mask (#27847) · 4d806dba
jiqing-feng authored Dec 07, 2023
```
* use _prepare_4d_attention_mask

* fix comment
```
4d806dba

Add Llama Flax Implementation (#24587) · 75336c17

Alex McKinney authored Dec 07, 2023

* Copies `modeling_flax_gpt_neo.py` to start

* MLP Block. WIP Attention and Block

* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue

* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.

* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating

* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..

* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?

* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now

* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now

* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting

* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗

* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later

* start porting pretrained wrappers
realised it probably needs return dict as a prereq

* cleanup, quality, style

* readds `return_dict` and model output named tuples

* (tentatively) pretrained wrappers work 🔥

* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.

* [WIP] debugging numerics

* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.

* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict

* adds missing TYPE_CHECKING import and `make fixup`

* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO

* commenting out equivalence test as can just use common

* debugging

* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥



* cleanup of modeling file

* cleanup of test file

* Resolving simpler review comments

* addresses more minor review comments

* fixing introduced pytest errors from review

* wip additional slow tests

* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay

* `make quality`, `make style`

* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs

* `make fix-copies`

* fix mangled function following `make fix-copies`

* adds missing type checking imports

* fixes missing parameter checkpoint warning

* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`

* swaps import guards
??? how did these get swapped initially?

* removing `inv_freq` again as pytorch version has now removed

* attempting to get CI to pass

* adds doc entries for llama flax models

* fixes typo in __init__.py imports

* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version

* overrides tests with dummy to see if CI passes
need to fill in these tests later

* adds my contribution to docs

* `make style; make quality`

* replaces random masking with fixed to work with flax version

* `make quality; make style`

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* updates `x`->`tensor` in `rotate_half`

* addresses smaller review comments

* Update docs/source/en/model_doc/llama.md
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adds integration test class

* adds `dtype` to rotary embedding to cast outputs

* adds type to flax llama rotary layer

* `make style`

* `make fix-copies`

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* applies suggestions from review

* Update modeling_flax_llama.py

* `make fix-copies`

* Update tests/models/llama/test_modeling_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixes shape mismatch in FlaxLlamaMLP

* applies some suggestions from reviews

* casts attn output logits to f32 regardless of dtype

* adds attn bias using `LlamaConfig.attention_bias`

* adds Copied From comments to Flax Llama test

* mistral and persimmon test change -copy from llama

* updates docs index

* removes Copied from in tests

it was preventing `make fix-copies` from succeeding

* quality and style

* ignores FlaxLlama input docstring

* adds revision to `_CHECKPOINT_FOR_DOC`

* repo consistency and quality

* removes unused import

* removes copied from from Phi test

now diverges from llama tests following FlaxLlama changes

* adds `_REAL_CHECKPOINT_FOR_DOC`

* removes refs from pr tests

* reformat to make ruff happy

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

75336c17