1. 16 Dec, 2023 1 commit
  2. 15 Dec, 2023 3 commits
  3. 14 Dec, 2023 5 commits
    • Matt's avatar
      Replace build() with build_in_name_scope() for some TF tests (#28046) · 3060899b
      Matt authored
      Replace build() with build_in_name_scope() for some tests
      3060899b
    • Matt's avatar
      Proper build() methods for TF (#27794) · 050e0b44
      Matt authored
      * Add a convenience method for building in your own name scope
      
      * Second attempt at auto layer building
      
      * Revert "Second attempt at auto layer building"
      
      This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be.
      
      * Attempt #3
      
      * Revert "Attempt #3"
      
      This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47.
      
      * Add missing attributes that we're going to need later
      
      * Add some attributes we're going to need later
      
      * A fourth attempt! Feel the power flow through you!
      
      * Revert "A fourth attempt! Feel the power flow through you!"
      
      This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7.
      
      * Add more values we'll need later
      
      * TF refactor that we'll need later
      
      * Revert "TF refactor that we'll need later"
      
      This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9.
      
      * Revert "Revert "TF refactor that we'll need later""
      
      This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13.
      
      * make fixup
      
      * Attempt five!
      
      * Revert "Attempt five!"
      
      This reverts commit 3302207958dfd0374b0447a51c06eea51a506044.
      
      * Attempt six - this time don't add empty methods
      
      * Revert "Attempt six - this time don't add empty methods"
      
      This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb.
      
      * Attempt seven - better base model class detection!
      
      * Revert "Attempt seven - better base model class detection!"
      
      This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9.
      
      * Another attribute we'll need later
      
      * Try again with the missing attribute!
      
      * Revert "Try again with the missing attribute!"
      
      This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7.
      
      * This is the attempt that will pierce the heavens!
      
      * Revert "This is the attempt that will pierce the heavens!"
      
      This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6.
      
      * Attempt seven - snag list is steadily decreasing
      
      * Revert "Attempt seven - snag list is steadily decreasing"
      
      This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316.
      
      * Attempt eight - will an empty snag list do it?
      
      * Revert "Attempt eight - will an empty snag list do it?"
      
      This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b.
      
      * Fixes to Hubert issues that cause problems later
      
      * Trying again with Conv1D/SeparableConv fixes
      
      * Revert "Trying again with Conv1D/SeparableConv fixes"
      
      This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e.
      
      * Apply the build shape fixes to Wav2Vec2 as well
      
      * One more attempt!
      
      * Revert "One more attempt!"
      
      This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c.
      
      * Another attempt!
      
      * Revert "Another attempt!"
      
      This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50.
      
      * Let's see how many failures we get without the internal build method
      
      * Fix OpenAI
      
      * Fix MobileBERT
      
      * (Mostly) fix GroupVIT
      
      * Fix BLIP
      
      * One more BLIP fix
      
      * One more BLIP fix!
      
      * Fix Regnet
      
      * Finally fully fix GroupViT
      
      * Fix Data2Vec and add the new AdaptivePool
      
      * Fix Segformer
      
      * Fix Albert
      
      * Fix Deberta/DebertaV2
      
      * Fix XLM
      
      * Actually fix XLM
      
      * Fix Flaubert
      
      * Fix lxmert
      
      * Fix Resnet
      
      * Fix ConvBERT
      
      * Fix ESM
      
      * Fix Convnext / ConvnextV2
      
      * Fix SAM
      
      * Fix Efficientformer
      
      * Fix LayoutLMv3
      
      * Fix speech_to_text
      
      * Fix mpnet and mobilevit
      
      * Fix Swin
      
      * Fix CTRL
      
      * Fix CVT
      
      * Fix DPR
      
      * Fix Wav2Vec2
      
      * Fix T5
      
      * Fix Hubert
      
      * Fix GPT2
      
      * Fix Whisper
      
      * Fix DeiT
      
      * Fix the encoder-decoder / dual-encoder classes
      
      * make fix-copies
      
      * build in name scope
      
      * Fix summarization test
      
      * Fix tied weight names for BART + Blenderbot
      
      * Fix tied weight name building
      
      * Fix to TFESM weight building
      
      * Update TF SAM
      
      * Expand all the shapes out into Big Boy Shapes
      050e0b44
    • Yoach Lacombe's avatar
      Fix languages covered by M4Tv2 (#28019) · bb1d0d0d
      Yoach Lacombe authored
      
      
      * correct language assessment  + add tests
      
      * Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * make style + simplify and enrich test
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      bb1d0d0d
    • Joao Gante's avatar
    • Joao Gante's avatar
      9e5c28c5
  4. 13 Dec, 2023 4 commits
  5. 11 Dec, 2023 9 commits
  6. 08 Dec, 2023 10 commits
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
    • Joao Gante's avatar
      ce0bbd51
    • Zach Mueller's avatar
      Allow `resume_from_checkpoint` to handle `auto_find_batch_size` (#27568) · 6757ed28
      Zach Mueller authored
      
      
      * Fuffill request
      
      * Add test
      
      * Better test
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Better test
      
      * Better test
      
      * MOre comments
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      6757ed28
    • Yih-Dar's avatar
      Fix 2 tests in `FillMaskPipelineTests` (#27889) · e3669375
      Yih-Dar authored
      
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      e3669375
    • Yih-Dar's avatar
    • Xin Qiu's avatar
      Fix remaining issues in beam score calculation (#27808) · b31905d1
      Xin Qiu authored
      * Fix issues in add and is_done for BeamHypotheses
      
      * make newly added arguments optional for better compatibility
      
      * Directly use cur_len as generated_len, add note for retrocompatibility
      
      * update test expectation
      
      * make cur_len represents the length of the entire sequence including the decoder prompt
      
      * remove redundant if/else in testing
      b31905d1
    • Jonathon Belotti's avatar
      4c5ed1d0
    • Saibo-creator's avatar
      Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty... · 56be5e80
      Saibo-creator authored
      
      Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty set of tokens (#27797)
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      56be5e80
    • fxmarty's avatar
      [️ removed a default argument] Make `AttentionMaskConverter` compatible with... · 307a7d0b
      fxmarty authored
      [️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868)
      
      * remove bugged torch.float32 default
      
      * add test
      
      * fix tests
      
      * fix test
      
      * fix doc
      307a7d0b
    • Tom Aarsen's avatar
      Generate: New `Cache` abstraction and Attention Sinks support (#26681) · 633215ba
      Tom Aarsen authored
      * Draft version of new KV Caching
      
      This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
      / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
      in a third-party or in transformers directly
      
      * Address numerous PR suggestions
      
      1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
      2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
      3. Remove __bool__ and __getitem__ magic as they're confusing.
      4. past_key_values.update(key, value, idx) now returns key, value.
      5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
      6. Separate key_cache and value_cache.
      
      Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
      
      * Implement the SinkCache through backward+forward rotations
      
      * Integrate (Sink)Cache with Llama FA2
      
      * Set use_legacy_cache=True as default, allows for test passes
      
      * Move from/to_legacy_cache to ...Model class
      
      * Undo unnecessary newline change
      
      * Remove copy utility from deprecated OpenLlama
      
      * Match import style
      
      * manual rebase with main
      
      * Cache class working with generate (#1)
      
      * Draft version of new KV Caching
      
      This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
      / StreamingLLM (https://arxiv.org/abs/2309.17453
      
      ) to be easily implemented
      in a third-party or in transformers directly
      
      * Address numerous PR suggestions
      
      1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
      2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
      3. Remove __bool__ and __getitem__ magic as they're confusing.
      4. past_key_values.update(key, value, idx) now returns key, value.
      5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
      6. Separate key_cache and value_cache.
      
      Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
      
      * Integrate (Sink)Cache with Llama FA2
      
      * Move from/to_legacy_cache to ...Model class
      
      * Undo unnecessary newline change
      
      * Match import style
      
      * working generate
      
      * Add tests; Simplify code; Apply changes to Mistral and Persimmon
      
      * fix rebase mess
      
      * a few more manual fixes
      
      * last manual fix
      
      * propagate changes to phi
      
      * upgrade test
      
      * add use_legacy_cache docstring; beef up tests
      
      * reintroduce unwanted deletes
      
      ---------
      Co-authored-by: default avatarTom Aarsen <Cubiegamedev@gmail.com>
      
      * move import
      
      * add default to model_kwargs.get('use_legacy_cache')
      
      * correct failing test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * apply PR suggestions
      
      * fix failing test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarTom Aarsen <37621491+tomaarsen@users.noreply.github.com>
      
      * PR comments
      
      * tmp commit
      
      * add docstrings
      
      * more tests, more docstrings, add to docs
      
      * derp
      
      * tmp commit
      
      * tmp dbg
      
      * more dbg
      
      * fix beam search bug
      
      * cache can be a list of tuples in some models
      
      * fix group beam search
      
      * all but sinkcache integration tests
      
      * fix sink cache and add hard integration test
      
      * now also compatible with input_embeds input
      
      * PR comments
      
      * add Cache support to Phi+FA2
      
      * make fixup
      
      ---------
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      633215ba
  7. 07 Dec, 2023 6 commits
    • Matt's avatar
      Fix TF loading PT safetensors when weights are tied (#27490) · 47500b1d
      Matt authored
      
      
      * Un-skip tests
      
      * Add aliasing support to tf_to_pt_weight_rename
      
      * Refactor tf-to-pt weight rename for simplicity
      
      * Patch mobilebert
      
      * Let us pray that the transfo-xl one works
      
      * Add XGLM rename
      
      * Expand the test to see if we can get more models to break
      
      * Expand the test to see if we can get more models to break
      
      * Fix MPNet (it was actually an unrelated bug)
      
      * Fix MPNet (it was actually an unrelated bug)
      
      * Add speech2text fix
      
      * Update src/transformers/modeling_tf_pytorch_utils.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update to always return a tuple from tf_to_pt_weight_rename
      
      * reformat
      
      * Add a couple of missing tuples
      
      * Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway
      
      * Revert changes to modeling_tf_mpnet.py
      
      * Skip MPNet test and add explanation
      
      * Add weight link for BART
      
      * Add TODO to clean this up a bit
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      47500b1d
    • fxmarty's avatar
      Fix device of masks in tests (#27887) · c99f2547
      fxmarty authored
      fix device of mask in tests
      c99f2547
    • Yih-Dar's avatar
      Allow `# Ignore copy` (#27328) · 52746922
      Yih-Dar authored
      
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      52746922
    • Younes Belkada's avatar
      [`Llava`] Add Llava to transformers (#27662) · 44b5506d
      Younes Belkada authored
      * add model like
      
      * logits match
      
      * minor fixes
      
      * fixes
      
      * up
      
      * up
      
      * add todo
      
      * llava processor
      
      * keep the processor simple
      
      * add conversion script
      
      * fixup
      
      * fix copies
      
      * up
      
      * add to index
      
      * fix config + logits
      
      * fix
      
      * refactor
      
      * more refactor
      
      * more refactor
      
      * fix copies
      
      * add authors
      
      * v1 tests
      
      * add `LlavaProcessor` in init
      
      * remove unneeded import
      
      * up
      
      * up
      
      * docs
      
      * up
      
      * fix CI
      
      * fix CI
      
      * add attention  mask in test
      
      * make fixup
      
      * remove the vision model
      
      * that' s the dirty way to do it
      
      * nits
      
      * nits
      
      * updates
      
      * add more tests
      
      * add input tests
      
      * fixup
      
      * more styling
      
      * nits
      
      * updates amd cleanup
      
      * fixup the generation expected results
      
      * fix the testing script
      
      * some cleanup and simplification which does not work yet but almost there!
      
      * make correct dispatch operations
      
      * vectorize works for batch of images and text
      
      * last todos
      
      * nits
      
      * update test and modeling code
      
      * remove useless function for now
      
      * fix few issues
      
      * fix generation
      
      * some nits
      
      * add bakllava
      
      * nits
      
      * remove duplicated code
      
      * finis merge
      
      * cleanup
      
      * missed this line
      
      * fill the todos
      
      * add left padding offset
      
      * add left and rignt padding logic
      
      * bool to properly index
      
      * make sure
      
      * more cleanups
      
      * batch is fixed 😉
      
      
      
      * add correct device for tensor creation
      
      * fix some dtype missmatch
      
      * ruff
      
      * update conversion script
      
      * Update src/transformers/__init__.py
      
      * fa 2 support + fix conversion script
      
      * more
      
      * correct reshaping
      
      * fix test dict
      
      * fix copies by ignoring
      
      * fix nit
      
      * skip clip vision model
      
      * fixup
      
      * fixup
      
      * LlavaForVisionText2Text -> LlavaForCausalLM
      
      * update
      
      * fix
      
      * raise correct errors
      
      * fix
      
      * docs
      
      * nuke for now
      
      * nits here and there
      
      * fixup
      
      * fix remaining tests
      
      * update LlavaForConditionalGeneration instead of CausalLM
      
      * fixups
      
      * pipeline support
      
      * slow and piepline tests
      
      * supports batch
      
      * nits
      
      * cleanup
      
      * fix first integration tests
      
      * add pad token where needed
      
      * correct etsts
      
      * fixups
      
      * update pipeline testr
      
      * fix quality
      
      * nits
      
      * revert unneeded change
      
      * nit
      
      * use BatchFeature
      
      * from ...feature_extraction_utils import BatchFeature
      
      * nits
      
      * nits
      
      * properly update
      
      * more f*** nits
      
      * fix copies
      
      * comment
      
      * keep slow test slow
      
      * Update src/transformers/models/llava/processing_llava.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * add piepline example
      
      * add pixel values in docstrign
      
      * update pr doctest
      
      * fix
      
      * fix slow tests
      
      * remove hack
      
      * fixup
      
      * small note
      
      * forward contrib credits from PR25789
      
      * forward contrib credits from original implementation and work
      
      * add arthur
      
      * Update src/transformers/models/llava/processing_llava.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * update docstring
      
      * nit
      
      * move to not doctested because of timeout issues
      
      * fixup
      
      * add description
      
      * more
      
      * fix-copies
      
      * fix docs
      
      * add beam search
      
      * add more comments
      
      * add typehints on processor
      
      * add speedup plot
      
      * update slow tests and docs
      
      * push test
      
      * push batched test
      
      * fix batched generation with different number of images
      
      * remove benchmark due to a bug
      
      * fix test
      
      * fix copies
      
      * add gcolab demo
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarshauray8 <shauray8@users.noreply.github.com>
      Co-authored-by: default avatarhaotian-liu <haotian-liu@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      44b5506d
    • Susnato Dhar's avatar
      [`FA-2`] Add Flash Attention to `Phi` (#27661) · f84d85ba
      Susnato Dhar authored
      * add FA and modify doc file
      
      * test_flash_attn_2_generate_padding_right test overwritten
      
      * comment
      
      * modify persimmon modeling file
      
      * added speedup graph
      
      * more changes
      f84d85ba
    • Alex McKinney's avatar
      Add Llama Flax Implementation (#24587) · 75336c17
      Alex McKinney authored
      * Copies `modeling_flax_gpt_neo.py` to start
      
      * MLP Block. WIP Attention and Block
      
      * Adds Flax implementation of `LlamaMLP`
      Validated with in-file test.
      Some slight numeric differences, but assuming it isn't an issue
      
      * Adds `FlaxLlamaRMSNorm` layer
      `flax.linen` includes `RMSNorm` layer but not necessarily in all
      versions. Hence, we add in-file.
      
      * Adds FlaxLlamaAttention
      Copied from GPT-J as it has efficient caching implementation as well as
      rotary embeddings.
      Notice numerically different, but not by a huge amount. Needs
      investigating
      
      * Adds `FlaxLlamaDecoderLayer`
      numerically inaccurate, debugging..
      
      * debugging rotary mismatch
      gptj uses interleaved whilst llama uses contiguous
      i think they match now but still final result is wrong.
      maybe drop back to just debugging attention layer?
      
      * fixes bug with decoder layer
      still somewhat numerically inaccurate, but close enough for now
      
      * adds markers for what to implement next
      the structure here diverges a lot from the PT version.
      not a big fan of it, but just get something working for now
      
      * implements `FlaxLlamaBlockCollection`]
      tolerance must be higher than expected, kinda disconcerting
      
      * Adds `FlaxLlamaModule`
      equivalent PyTorch model is `LlamaModel`
      yay! a language model🤗
      
      * adds `FlaxLlamaForCausalLMModule`
      equivalent to `LlamaForCausalLM`
      still missing returning dict or tuple, will add later
      
      * start porting pretrained wrappers
      realised it probably needs return dict as a prereq
      
      * cleanup, quality, style
      
      * readds `return_dict` and model output named tuples
      
      * (tentatively) pretrained wrappers work 🔥
      
      * fixes numerical mismatch in `FlaxLlamaRMSNorm`
      seems `jax.lax.rsqrt` does not match `torch.sqrt`.
      manually computing `1 / jax.numpy.sqrt` results in matching values.
      
      * [WIP] debugging numerics
      
      * numerical match
      I think issue was accidental change of backend. forcing CPU fixes test.
      We expect some mismatch on GPU.
      
      * adds in model and integration tests for Flax Llama
      summary of failing:
      - mul invalid combination of dimensions
      - one numerical mismatch
      - bf16 conversion (maybe my local backend issue)
      - params are not FrozenDict
      
      * adds missing TYPE_CHECKING import and `make fixup`
      
      * adds back missing docstrings
      needs review on quality of docstrings, not sure what is required.
      Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO
      
      * commenting out equivalence test as can just use common
      
      * debugging
      
      * Fixes bug where mask and pos_ids were swapped in pretrained models
      This results in all tests passing now 🔥
      
      
      
      * cleanup of modeling file
      
      * cleanup of test file
      
      * Resolving simpler review comments
      
      * addresses more minor review comments
      
      * fixing introduced pytest errors from review
      
      * wip additional slow tests
      
      * wip tests
      need to grab a GPU machine to get real logits for comparison
      otherwise, slow tests should be okay
      
      * `make quality`, `make style`
      
      * adds slow integration tests
      - checking logits
      - checking hidden states
      - checking generation outputs
      
      * `make fix-copies`
      
      * fix mangled function following `make fix-copies`
      
      * adds missing type checking imports
      
      * fixes missing parameter checkpoint warning
      
      * more finegrained 'Copied from' tags
      avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`
      
      * swaps import guards
      ??? how did these get swapped initially?
      
      * removing `inv_freq` again as pytorch version has now removed
      
      * attempting to get CI to pass
      
      * adds doc entries for llama flax models
      
      * fixes typo in __init__.py imports
      
      * adds back special equivalence tests
      these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version
      
      * overrides tests with dummy to see if CI passes
      need to fill in these tests later
      
      * adds my contribution to docs
      
      * `make style; make quality`
      
      * replaces random masking with fixed to work with flax version
      
      * `make quality; make style`
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * updates `x`->`tensor` in `rotate_half`
      
      * addresses smaller review comments
      
      * Update docs/source/en/model_doc/llama.md
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * adds integration test class
      
      * adds `dtype` to rotary embedding to cast outputs
      
      * adds type to flax llama rotary layer
      
      * `make style`
      
      * `make fix-copies`
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * applies suggestions from review
      
      * Update modeling_flax_llama.py
      
      * `make fix-copies`
      
      * Update tests/models/llama/test_modeling_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * fixes shape mismatch in FlaxLlamaMLP
      
      * applies some suggestions from reviews
      
      * casts attn output logits to f32 regardless of dtype
      
      * adds attn bias using `LlamaConfig.attention_bias`
      
      * adds Copied From comments to Flax Llama test
      
      * mistral and persimmon test change -copy from llama
      
      * updates docs index
      
      * removes Copied from in tests
      
      it was preventing `make fix-copies` from succeeding
      
      * quality and style
      
      * ignores FlaxLlama input docstring
      
      * adds revision to `_CHECKPOINT_FOR_DOC`
      
      * repo consistency and quality
      
      * removes unused import
      
      * removes copied from from Phi test
      
      now diverges from llama tests following FlaxLlama changes
      
      * adds `_REAL_CHECKPOINT_FOR_DOC`
      
      * removes refs from pr tests
      
      * reformat to make ruff happy
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      75336c17
  8. 05 Dec, 2023 2 commits
    • Yih-Dar's avatar
    • Arindam Jati's avatar
      [Time series] Add PatchTSMixer (#26247) · b242d0f2
      Arindam Jati authored
      
      
      * patchtsmixer initial commit
      
      * x,y->context_values,target_values, unittest addded
      
      * cleanup code
      
      * minor
      
      * return hidden states
      
      * model tests, partial integration tests
      
      * ettm notebook temporary
      
      * minor
      
      * config mask bug fix, tests updated
      
      * final ETT notebooks
      
      * add selfattn
      
      * init
      
      * added docstrings
      
      * PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining
      
      * functionality tests added
      
      * add start and input docstrings
      
      * docstring edits
      
      * testcase edits
      
      * minor changes
      
      * docstring error fixed
      
      * ran make fixup
      
      * finalize integration tests and docs
      
      * minor
      
      * cleaned gitignore
      
      * added dataclass decorator, ran black formatter
      
      * ran ruff
      
      * formatting
      
      * add slow decorator
      
      * renamed in_Channel to input_size and default to 1
      
      * shorten dataclass names
      
      * use smaller model for testing
      
      * moved the 3 heads to the modeling file
      
      * use scalers instead of revin
      
      * support forecast_channel_indices
      
      * fix regression scaling
      
      * undo reg. scaling
      
      * removed unneeded classes
      
      * forgot missing
      
      * add more layers
      
      * add copied positional_encoding
      
      * use patchmask from patchtst
      
      * removed dependency on layers directory
      
      * formatting
      
      * set seed
      
      * removed unused imports
      
      * fixed forward signature test
      
      * adding distributional head for PatchTSMixerForecasting
      
      * add generate to forecast
      
      * testcases for generate
      
      * add generate and distributional head for regression
      
      * raise Exception for negative values for neg binominal distribution
      
      * formatting changes
      
      * remove copied from patchtst and add TODO for test passing
      
      * make copies
      
      * doc edits
      
      * minor changes
      
      * format issues
      
      * minor changes
      
      * minor changes
      
      * format docstring
      
      * change some class names to PatchTSMixer + class name
      
      Transpose to PatchTSMixerTranspose
      GatedAttention to PatchTSMixerGatedAttention
      
      * change NormLayer to PatchTSMixerNormLayer
      
      * change MLP to PatchTSMixerMLP
      
      * change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock
      
      * change ChannelFeatureMixer to ChannelFeatureMixerBlock
      
      * change PatchMasking to PatchTSMixerMasking
      
      * change Patchify to PatchTSMixerPatchify
      
      * list to `list`
      
      * fix docstrings
      
      * formatting
      
      * change bs to batch_size, edit forecast_masking
      
      * edit random_masking
      
      * change variable name and update docstring in PatchTSMixerMasking
      
      * change variable name and update docstring in InjectScalerStatistics4D
      
      * update forward call in PatchTSMixerTranspose
      
      * change variable name and update docstring in PatchTSMixerNormLayer
      
      * change variable name and update docstring in PatchTSMixerMLP
      
      * change variable name and update docstring in ChannelFeatureMixerBlock
      
      * formatting
      
      * formatting issues
      
      * docstring issue
      
      * fixed observed_mask type in docstrings
      
      * use FloatTensor type
      
      * formatting
      
      * fix rescaling issue in forecasting, fixed integration tests
      
      * add docstring from decorator
      
      * fix docstring
      
      * Update README.md
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * PatchTSMixerChannelFeatureMixerBlock
      
      * formatting
      
      * ForPretraining
      
      * use num_labels instead of n_classes
      
      * remove commented out code
      
      * docstring fixed
      
      * nn.functional used instead of one letter F
      
      * x_tmp renamed
      
      * one letter variable x removed from forward calls
      
      * one letter variable y removed
      
      * remove commented code
      
      * rename patch_size, in_channels, PatchTSMixerBackbone
      
      * add config to heads
      
      * add config to heads tests
      
      * code reafactoring to use config instead of passing individual params
      
      * Cdocstring fixes part 1
      
      * docstring fixes part 2
      
      * removed logger.debug
      
      * context_values -> past_values
      
      * formatting changes
      
      * pe -> positional_encoding
      
      * removed unused target variable
      
      * self.mode logic fixed
      
      * formatting change
      
      * edit docstring and var name
      
      * change n_targets to num_targets
      
      * rename input_size to num_input_channels
      
      * add head names with prefix PatchTSMixer
      
      * edit docstring in PatchTSMixerForRegression
      
      * fix var name change in testcases
      
      * add PatchTSMixerAttention
      
      * return dict for all exposed classes, test cases added
      
      * format
      
      * move loss function to forward call
      
      * make style
      
      * adding return dict/tuple
      
      * make repo-consistency
      
      * remove flatten mode
      
      * code refactoring
      
      * rename data
      
      * remove PatchTSMixer and keep only PatchTSMixerEncoder
      
      * docstring fixes
      
      * removed unused code
      
      * format
      
      * format
      
      * remove contiguous and formatting changes
      
      * remove model description from config
      
      * replace asserts with ValueError
      
      * remove nn.Sequential from PatchTSMixerNormLayer
      
      * replace if-else with map
      
      * remove all nn.Sequential
      
      * format
      
      * formatting
      
      * fix gradient_checkpointing error after merge, and formatting
      
      * make fix-copies
      
      * remove comments
      
      * reshape
      
      * doesnt support gradient checkpointing
      
      * corect Patchify
      
      * masking updates
      
      * batchnorm copy from
      
      * format checks
      
      * scaler edits
      
      * remove comments
      
      * format changes
      
      * remove self.config
      
      * correct class PatchTSMixerMLP(nn.Module):
      
      * makr fix
      
      * doc updates
      
      * fix-copies
      
      * scaler class correction
      
      * doc edits
      
      * scaler edits
      
      * update readme with links
      
      * injectstatistics add
      
      * fix-copies
      
      * add norm_eps option to LayerNorm
      
      * format changes
      
      * fix copies
      
      * correct make copies
      
      * use parametrize
      
      * fix doc string
      
      * add docs to toctree
      
      * make style
      
      * doc segmenting
      
      * docstring edit
      
      * change forecast to prediction
      
      * edit doc
      
      * doc edits
      
      * remove PatchTSMixerTranspose
      
      * add PatchTSMixerPositionalEncoding and init position_enc
      
      * remove positional_encoding
      
      * edit forecast_masking, remove forecast_mask_ratios
      
      * fix broken code
      
      * var rename target_values -> future_values
      
      * num_features -> d_model
      
      * fix broken code after master merge
      
      * repo consistency
      
      * use postional embedding
      
      * prediction_logits -> prediction_outputs, make fix-copies
      
      * uncommented @slow
      
      * minor changes
      
      * loss first in tuple
      
      * tuple and dict same ordering
      
      * style edits
      
      * minor changes
      
      * dict/tuple consistent enablement
      
      * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix formatting
      
      * formatting
      
      * usage tip
      
      * test on cpu only
      
      * add sample usage
      
      * change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification
      
      * push changes
      
      * fix copies
      
      * std scaling set to default True case
      
      * minor changes
      
      * stylechanges
      
      ---------
      Co-authored-by: default avatarArindam Jati <arindam.jati@ibm.com>
      Co-authored-by: default avatarvijaye12 <vijaye12@in.ibm.com>
      Co-authored-by: default avatarKashif Rasul <kashif.rasul@gmail.com>
      Co-authored-by: default avatarnnguyen <nnguyen@us.ibm.com>
      Co-authored-by: default avatarvijaye12 <vijaykr.e@gmail.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      Co-authored-by: default avatarNam Nguyen <namctin@gmail.com>
      Co-authored-by: default avatarWesley Gifford <79663411+wgifford@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      b242d0f2