- 06 Aug, 2024 4 commits
-
-
Pavel Iakubovskii authored
* BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests
-
Fanli Lin authored
* add flash attention check * fix * fix * add the missing marker * bug fix * add one more * remove order * add one more
-
Prakarsh Kaushik authored
fix: add new llava like model bug
-
Raushan Turganbay authored
* draft * updates * works? * try adding python example in hidden section * another try * hwo do i render python * format as html code? * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/kv_cache.md Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * one more small update * should render hidden secrtion now * add outputs * fix links * check links * update all links * update with offloaded cache * all cache is importable, so they appear in docs * fix copies * docstring... --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
- 05 Aug, 2024 10 commits
-
-
Francisco Kurucz authored
-
amyeroberts authored
* Respect the config's attn if set * Update test - can override in from_config * Fix
-
Sai-Suraj-27 authored
Fixed tokenizertests for luke, mluke models.
-
Abdi authored
* fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize
-
Francisco Kurucz authored
-
Nicholas Broad authored
I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.
-
Ita Zaporozhets authored
* save total_vocab_size = vocab_size + user added tokens to speed up operation * updating length when added_tokens_decoder is set * add test len(tokenizer)
-
Raushan Turganbay authored
fix phi
-
TechInterMezzo authored
* fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction
-
dependabot[bot] authored
Bump keras in /examples/research_projects/decision_transformer Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1. - [Release notes](https://github.com/keras-team/keras/releases) - [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1 ) --- updated-dependencies: - dependency-name: keras dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 03 Aug, 2024 2 commits
-
-
Xueshen Liu authored
MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500) * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe) * fix typo [:-1] to [:, -1] * to meet formatting requirement * to meet formatting requirement * remove white space * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue. * propagate to startcoder2, phi3, mixtral and qwen2 * update qwen2_moe
-
Shaopeng Fu authored
fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157) fix: Exception raised when running .
-
- 02 Aug, 2024 3 commits
-
-
Sanchit Gandhi authored
* up * style * stopping
-
Joao Gante authored
tests! :D
-
Raushan Turganbay authored
nits
-
- 01 Aug, 2024 13 commits
-
-
Zach Mueller authored
* Test this zach * Test for improper init w/o zero3 * Move back * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Get rid of stars in warning * Make private * Make clear --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
OsamaS99 authored
* fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by:Aaron Haag <aaron.haag@siemens.com>
-
Joao Gante authored
-
Nikos Karampatziakis authored
* Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams
-
Omar Salman authored
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion
-
Viktor Scherbakov authored
empty list in defaults
-
Ita Zaporozhets authored
-
Hanna Yukhymenko authored
* Remove TPU device map for saving tokenizer config * Update tokenization_utils_base.py * Fix error msg when passing non-string device into tokenizer * Fix error message for non-string tokenizer device * Print out tokenizer device type in error msg * Update tokenization_utils_base.py
-
nv-guomingz authored
Co-authored-by:Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com>
-
Lunwen He authored
* Remove size check between attn_weights and kv_seq_len * add unit tests
-
Sanchit Gandhi authored
* [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace
-
Sanchit Gandhi authored
-
Raushan Turganbay authored
cache class flag
-
- 31 Jul, 2024 8 commits
-
-
Ricardo authored
-
Sai-Suraj-27 authored
* Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument.
-
fxmarty authored
* draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by:
sanchit-gandhi <sanchit@huggingface.co> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Aymeric Roucher authored
Fix error when streaming agent run to gradio with non-string tool arguments
-
Joao Gante authored
-
amyeroberts authored
* Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2
-
Joao Gante authored
fix
💩 -
Raushan Turganbay authored
* enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore
-