"docs/vscode:/vscode.git/clone" did not exist on "0927bfd002f2691059125b7fb8f6e0fc081de695"
- 01 Aug, 2024 6 commits
-
-
OsamaS99 authored
* fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by:Aaron Haag <aaron.haag@siemens.com>
-
Nikos Karampatziakis authored
* Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams
-
Omar Salman authored
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion
-
Ita Zaporozhets authored
-
Lunwen He authored
* Remove size check between attn_weights and kv_seq_len * add unit tests
-
Sanchit Gandhi authored
* [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace
-
- 31 Jul, 2024 4 commits
-
-
fxmarty authored
* draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by:
sanchit-gandhi <sanchit@huggingface.co> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
amyeroberts authored
* Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2
-
Joao Gante authored
fix
馃挬 -
Raushan Turganbay authored
* enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore
-
- 30 Jul, 2024 1 commit
-
-
Joshua Lochner authored
* Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI
-
- 29 Jul, 2024 5 commits
-
-
Guang Yang authored
-
Sanchit Gandhi authored
* [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by:
Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by:
Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>
-
Kamil Akesbi authored
* fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test
-
Joao Gante authored
* mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test
馃槇 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation -
Raushan Turganbay authored
* bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 26 Jul, 2024 5 commits
-
-
Raushan Turganbay authored
* fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2
-
Fanli Lin authored
[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039) * add flash attention check * fix * fix
-
Sai-Suraj-27 authored
* Refactored to remove un-necessary object base class. * small fix.
-
Raushan Turganbay authored
* llava w/o images * tests
-
Raushan Turganbay authored
* fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_utils.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add comment to explain --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 25 Jul, 2024 3 commits
-
-
Yih-Dar authored
* fix * [test_all] trigger full CI --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Kashif Rasul authored
fix E721 warnings
-
Sanchit Gandhi authored
* [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test
-
- 24 Jul, 2024 5 commits
-
-
Sai-Suraj-27 authored
Replaced deprecated unittest method with the correct one.
-
Matt authored
* No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again
-
Penut Chen authored
* support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16
-
Joao Gante authored
* relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist
-
amyeroberts authored
Remove conversation pipeline tests
-
- 23 Jul, 2024 11 commits
-
-
Sai-Suraj-27 authored
* Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff
-
RhuiDih authored
* add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py
-
Sanchit Gandhi authored
Revert "Incorrect Whisper long-form decoding timestamps (#32003)" This reverts commit cd48553f.
-
Amit Garg authored
* renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format
-
Merve Noyan authored
--------- Co-authored-by:Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
-
Ita Zaporozhets authored
* gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test * typo * clean test
-
Joao Gante authored
Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
bayllama authored
* Change resize_token_embeddings to make it return same Class that is passed to it * Add explanatory comment as requested in review * Add explanatory comments for add resizing function in lxmert * Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining --------- Co-authored-by:
Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net> Co-authored-by:
Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>
-
mig-mfreitas authored
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods YaRN (Yet another RoPE extension method) combines the NTK-By-Parts Interpolation and Attention Scaling methods, improving upon existing RoPE interpolation methods for longer context window sizes. Fine-tuned models maintain their original performance across benchmarks while enabling efficient extrapolation and transfer learning for quicker convergence, especially in compute-limited environments. We implement YaRN and Dynamic-YaRN for the following list of models: - LLaMA - Falcon - GPT-NeoX - Olmo - Persimmon - Phi - StableLM - OpenLLaMA New unit tests are added to assert YaRN's correct behavior on both short and long sequence inputs. For more details, please refer to https://arxiv.org/abs/2309.00071 . Co-authored-by:
Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> * Refactor YaRN implementation for LLaMA Iterate on YaRN implementation for LLaMA and remove diff from remaining models for increased PR modularity. This commit includes the following changes: - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned') from YaRN classes - Inherit 'forward' method in YaRN classes from superclass - Rename 'yarn' method to 'compute_yarn_scaling' - Extend YaRN tests with further assertions - Fix style inconsistencies Co-authored-by:
Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt> * Refactor Tensor Building Logic for YaRN - Comply with the the tensor building logic introduced in #30743 - Add referencing to the optimized Attention Factor equation - Remove Dynamic YaRN for a more agile deployment Co-authored-by:
mig-mfreitas <mig-mfreitas@users.noreply.github.com> * remove unwanted file --------- Co-authored-by:
Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> Co-authored-by:
mig-mfreitas <mig-mfreitas@users.noreply.github.com> Co-authored-by:
Joao Gante <joao@huggingface.co>
-
Anton Vlasjuk authored
* fix mask creation of gpt2 and gpt_neox caused by me * forgot the reshape of masks when shape > 2 * add tests for gpt neox and gpt2 * nit on a comment
-
Sanchit Gandhi authored
* [whisper integration] use parquet dataset for testing * propagate to others * more propagation * last one
-