- 12 Aug, 2024 2 commits
-
-
Raushan Turganbay authored
* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment
-
Younes Belkada authored
* v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix tests --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 09 Aug, 2024 1 commit
-
-
Arthur authored
no empty revision
-
- 08 Aug, 2024 3 commits
-
-
Pablo Montalvo authored
* I think inputs_embeds has ndim == 3 * fix sequence length catch * add generate test * [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama * skip whisper * fix bart test * more fixes
-
Yunfei Chu authored
* add qwen2audio * Update check_repo.py * fix style * fix test * fix style * add model size * Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * switch the attention_mask and the feature_attention_mask * add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py * fix initialization * update chat_template * fix consistency issue after copy * add docstrings to _merge_input_ids_with_audio_features * add copied from to prepare_inputs_for_generation * add more details to docs * rm comment * add init_std * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * update * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update tests * rm ignore_index * update processor * rm ffmpeg_read * Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * typo * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * fix quality * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * add official model --------- Co-authored-by:
Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Sangbum Daniel Choi authored
* fix typo * uniform kwargs * make style * add comments * remove return_tensors * remove common_kwargs from processor since it propagates * make style * return_token_type_ids to True * revert the default imagekwargs since does not accept any value in the image processro * revert processing_utils.py * make style * add molbap's commit * fix typo * fix common processor * remain * Revert "add molbap's commit" This reverts commit a476c6ee88318ce40d73ea31e2dc2d4faa8ae410. * add unsync PR * revert * make CI happy * nit * import annotationformat
-
- 06 Aug, 2024 5 commits
-
-
Pablo Montalvo authored
* add new model like * draft cuda forward - mismatched keys (sharding on conv1) * match keys successfully * fix split * get generation/forward running (wrong gens, norm?) * :update * some refactoring * fixes * works up until copy to cache * fix * update * NON WORKING VERSION * version that work? * nit * fix config * fix conversion script * working cuda forward * nit * update * simplifcation * make mamba slow simple work * no einops * todo * fix style * no einops * update fix no einsum * nit * remove einops * bug: scan_output differs strongly * add rms norm option * fix fast + slow generation with and w/o cache
✔ * draft integration tests * remove a big chunk of the einsum * fix slow, fast generations, without any einsum * fix copies * fix structure * fix up modeling and tests * fix tests * clamping is indeed worse * recover mamba2 cache test * fix copies * no cache position (yet) * fix tf tests * fix matmul for generate * fixup * skip cache tests for now * [run-slow]mamba2 * tune out hidden states for padding * test batched generation * propagate attention mask changes * fix past length * fix integration test * style * address comments * update readme * add mamba2 version check * fix tests * [run-slow]mamba2 * skip edge tests * [run-slow]mamba2 * last fixup * [run-slow]mamba2 * update README --------- Co-authored-by:Arthur Zucker <arthur.zucker@gmail.com>
-
Ao Tang authored
* Add nemotron support * fix inference * add unit test * add layernorm1p as a class to avoid meta device mismatch * test fixed * Add copied_from statements * remove pretraining_tp args * remove nemotronlayernorm * force LN computation done in FP32 * remove nemotrontokenizer and use llamatokenizer * license update * add option for kv_channels for minitron8b * remove assert * o_proj fixed * o_proj reshape * add gated_proj option * typo * remove todos * fix broken test after merging latest main * remove nezha/nat after meging main * chnage default config to 15b model * add nemo conversion script * rename conversion script * remove gate_proj option * pr comment resolved * fix unit test * rename kv_channels to head_dim * resolve PR issue * add nemotron md * fix broken tests * refactor rope for nemotron * test fix * remove linearscaling * whitespace and import * fix some copied-from * code style fix * reformatted * add position_embedding to nemotronattention * rope refactor to only use config, copied-from fix * format * Run make fix-copies * nemotron md with autodoc * doc fix * fix order * pass check_config_docstrings.py * fix config_attributes * remove all llama BC related code * Use PreTrainedTokenizerFast * ruff check examples * conversion script update * add nemotron to toctree
-
Francisco Kurucz authored
-
Pavel Iakubovskii authored
* BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests
-
Fanli Lin authored
* add flash attention check * fix * fix * add the missing marker * bug fix * add one more * remove order * add one more
-
- 05 Aug, 2024 4 commits
-
-
Sai-Suraj-27 authored
Fixed tokenizertests for luke, mluke models.
-
Abdi authored
* fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize
-
Raushan Turganbay authored
fix phi
-
TechInterMezzo authored
* fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction
-
- 01 Aug, 2024 2 commits
-
-
Lunwen He authored
* Remove size check between attn_weights and kv_seq_len * add unit tests
-
Sanchit Gandhi authored
* [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace
-
- 31 Jul, 2024 4 commits
-
-
fxmarty authored
* draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by:
sanchit-gandhi <sanchit@huggingface.co> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
amyeroberts authored
* Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2
-
Joao Gante authored
fix
💩 -
Raushan Turganbay authored
* enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore
-
- 30 Jul, 2024 1 commit
-
-
Joshua Lochner authored
* Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI
-
- 29 Jul, 2024 2 commits
-
-
Kamil Akesbi authored
* fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test
-
Joao Gante authored
* mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test
😈 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation
-
- 26 Jul, 2024 2 commits
-
-
Sai-Suraj-27 authored
* Refactored to remove un-necessary object base class. * small fix.
-
Raushan Turganbay authored
* llava w/o images * tests
-
- 25 Jul, 2024 3 commits
-
-
Yih-Dar authored
* fix * [test_all] trigger full CI --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Kashif Rasul authored
fix E721 warnings
-
Sanchit Gandhi authored
* [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test
-
- 24 Jul, 2024 3 commits
-
-
Sai-Suraj-27 authored
Replaced deprecated unittest method with the correct one.
-
Matt authored
* No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again
-
Joao Gante authored
* relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist
-
- 23 Jul, 2024 8 commits
-
-
Sai-Suraj-27 authored
* Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff
-
Sanchit Gandhi authored
Revert "Incorrect Whisper long-form decoding timestamps (#32003)" This reverts commit cd48553f.
-
Amit Garg authored
* renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format
-
Merve Noyan authored
--------- Co-authored-by:Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
-
Joao Gante authored
Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
mig-mfreitas authored
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods YaRN (Yet another RoPE extension method) combines the NTK-By-Parts Interpolation and Attention Scaling methods, improving upon existing RoPE interpolation methods for longer context window sizes. Fine-tuned models maintain their original performance across benchmarks while enabling efficient extrapolation and transfer learning for quicker convergence, especially in compute-limited environments. We implement YaRN and Dynamic-YaRN for the following list of models: - LLaMA - Falcon - GPT-NeoX - Olmo - Persimmon - Phi - StableLM - OpenLLaMA New unit tests are added to assert YaRN's correct behavior on both short and long sequence inputs. For more details, please refer to https://arxiv.org/abs/2309.00071 . Co-authored-by:
Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> * Refactor YaRN implementation for LLaMA Iterate on YaRN implementation for LLaMA and remove diff from remaining models for increased PR modularity. This commit includes the following changes: - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned') from YaRN classes - Inherit 'forward' method in YaRN classes from superclass - Rename 'yarn' method to 'compute_yarn_scaling' - Extend YaRN tests with further assertions - Fix style inconsistencies Co-authored-by:
Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt> * Refactor Tensor Building Logic for YaRN - Comply with the the tensor building logic introduced in #30743 - Add referencing to the optimized Attention Factor equation - Remove Dynamic YaRN for a more agile deployment Co-authored-by:
mig-mfreitas <mig-mfreitas@users.noreply.github.com> * remove unwanted file --------- Co-authored-by:
Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> Co-authored-by:
mig-mfreitas <mig-mfreitas@users.noreply.github.com> Co-authored-by:
Joao Gante <joao@huggingface.co>
-
Anton Vlasjuk authored
* fix mask creation of gpt2 and gpt_neox caused by me * forgot the reshape of masks when shape > 2 * add tests for gpt neox and gpt2 * nit on a comment
-
Sanchit Gandhi authored
* [whisper integration] use parquet dataset for testing * propagate to others * more propagation * last one
-