Commits · a30c865f991dfec9452cc64bd9a97bfbb96be036 · chenpangpang / transformers

07 Aug, 2024 1 commit

Cache: new Cache format in decoder-only models (#31421) · a30c865f

Raushan Turganbay authored Aug 07, 2024



* draft bart with new cache

* add cache for decoder-only models

* revert utils

* modify docstring

* revert bart

* minor fixes

* fix copies (not related)

* revert tests

* remove enc-dec related code

* remove bloom

* remove opt (enc-dec)

* update docstring

* git, codegen, gpt_neo, gpt_neox, gpj

* clean up

* copied from statements

* revert

* tmp

* update warning msg

* forgot git

* add more flags

* run-slow git,codegen,gpt_neo,gpt_neox,gpj

* add cache flag to VLMs

* remove files

* style

* video LLMs also need a flag

* style

* llava will go in another PR

* style

* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copy from

* deprecate until v4.45 and warn if not training

* nit

* fix test

* test static cache

* add more tests and fix models

* fix copies

* return sliding window mask

* run slow tests & fix + codestyle

* one more falcon fix for alibi

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

a30c865f

06 Aug, 2024 17 commits

🌐

[i18n-KO] Translated `image_to_image.md` to Korean (#32327) · 6af0854e

HyunJi Shin authored Aug 07, 2024



* docs: ko: tasks/image_to_image.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

* fix: handle remaining suggestions
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

---------
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

6af0854e

🌐

[i18n-KO] Translated `idefics.md` to Korean (#32258) · 3b193c7b

boyunJang authored Aug 07, 2024



* docs: ko: tasks/idefics.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>

---------
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>

3b193c7b

🌐

[i18n-KO] Translated `mask_generation.md` to Korean (#32257) · 5301b981

timdalxx authored Aug 07, 2024



* docs: ko: tasks/mask_generation.md

* feat: nmt draft

* fix : toc local

* fix : manual edits

* fix : ko-toctree

* fix: resolve suggestions
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

* fix: resolve suggestions

* fix: resolve suggestions

---------
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

5301b981

Revert "fixes to properly shard FSDP across cpu and meta for... · ac2707e8

Matthew Douglas authored Aug 06, 2024

Revert "fixes to properly shard FSDP across cpu and meta for cpu_effcient_loading for prequantized 4bit (#32276)" (#32477)

* Revert "fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276)"

This reverts commit 62c60a30

.

We uncovered an issue with this change that caused our training runs to hang.

* `is_torchdynamo_compiling` -- cast a wide exception net (#32476)

* cast a wide net

* make fix-copies with a few manual changes

* add copied from

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

ac2707e8

`is_torchdynamo_compiling` -- cast a wide exception net (#32476) · 4fdc7020
Joao Gante authored Aug 06, 2024
```
* cast a wide net

* make fix-copies with a few manual changes

* add copied from
```
4fdc7020
dev version 4.45.0 · 26a9443d
Arthur Zucker authored Aug 06, 2024

26a9443d
Documentation: BOS token_id deprecation change for NLLB (#32443) · 50c3ba88
Chris Toukmaji authored Aug 06, 2024
```
Update nllb.md
```
50c3ba88

Migrate import checks not need accelerate, and be more clear on min versions (#32292) · 194cf1f3

Zach Mueller authored Aug 06, 2024

* Migrate import checks to secondary accelerate calls

* better errs too

* Revert, just keep the import checks + remove accelerate-specific things

* Rm extra'

* Empty commit for ci

* Small nits

* Final

194cf1f3

Add codestral mamba2 (#32080) · 80b90e7b

Pablo Montalvo authored Aug 06, 2024

* add new model like

* draft cuda forward - mismatched keys (sharding on conv1)

* match keys successfully

* fix split

* get generation/forward running (wrong gens, norm?)

* :update

* some refactoring

* fixes

* works up until copy to cache

* fix

* update

* NON WORKING VERSION

* version that work?

* nit

* fix config

* fix conversion script

* working cuda forward

* nit

* update

* simplifcation

* make mamba slow simple work

* no einops

* todo

* fix style

* no einops

* update fix no einsum

* nit

* remove einops

* bug: scan_output differs strongly

* add rms norm option

* fix fast + slow generation with and w/o cache ✔



* draft integration tests

* remove a big chunk of the einsum

* fix slow, fast generations, without any einsum

* fix copies

* fix structure

* fix up modeling and tests

* fix tests

* clamping is indeed worse

* recover mamba2 cache test

* fix copies

* no cache position (yet)

* fix tf tests

* fix matmul for generate

* fixup

* skip cache tests for now

* [run-slow]mamba2

* tune out hidden states for padding

* test batched generation

* propagate attention mask changes

* fix past length

* fix integration test

* style

* address comments

* update readme

* add mamba2 version check

* fix tests

* [run-slow]mamba2

* skip edge tests

* [run-slow]mamba2

* last fixup

* [run-slow]mamba2

* update README

---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

80b90e7b

Generate: fix end to end compilation (#32465) · 3d8bd119
Joao Gante authored Aug 06, 2024

3d8bd119

Add Nemotron HF Support (#31699) · 6a03942d

Ao Tang authored Aug 06, 2024

* Add nemotron support

* fix inference

* add unit test

* add layernorm1p as a class to avoid meta device mismatch

* test fixed

* Add copied_from statements

* remove pretraining_tp args

* remove nemotronlayernorm

* force LN computation done in FP32

* remove nemotrontokenizer and use llamatokenizer

* license update

* add option for kv_channels for minitron8b

* remove assert

* o_proj fixed

* o_proj reshape

* add gated_proj option

* typo

* remove todos

* fix broken test after merging latest main

* remove nezha/nat after meging main

* chnage default config to 15b model

* add nemo conversion script

* rename conversion script

* remove gate_proj option

* pr comment resolved

* fix unit test

* rename kv_channels to head_dim

* resolve PR issue

* add nemotron md

* fix broken tests

* refactor rope for nemotron

* test fix

* remove linearscaling

* whitespace and import

* fix some copied-from

* code style fix

* reformatted

* add position_embedding to nemotronattention

* rope refactor to only use config, copied-from fix

* format

* Run make fix-copies

* nemotron md with autodoc

* doc  fix

* fix order

* pass check_config_docstrings.py

* fix config_attributes

* remove all llama BC related code

* Use PreTrainedTokenizerFast

* ruff check examples

* conversion script update

* add nemotron to toctree

6a03942d

Dependencies: fix typo (#32389) · 36fd35e1
Joao Gante authored Aug 06, 2024
```
deps_2
```
36fd35e1
Fix get large model config for Switch Transformer encoder only tester (#32438) · 438d06c9
Francisco Kurucz authored Aug 06, 2024

438d06c9

Update kwargs validation for `preprocess` with decorator (#32024) · fb66ef81

Pavel Iakubovskii authored Aug 06, 2024

* BLIP preprocess

* BIT preprocess

* BRIDGETOWER preprocess

* CHAMELEON preprocess

* CHINESE_CLIP preprocess

* CONVNEXT preprocess

* DEIT preprocess

* DONUT preprocess

* DPT preprocess

* FLAVA preprocess

* EFFICIENTNET preprocess

* FUYU preprocess

* GLPN preprocess

* IMAGEGPT preprocess

* INTRUCTBLIPVIDEO preprocess

* VIVIT preprocess

* ZOEDEPTH preprocess

* VITMATTE preprocess

* VIT preprocess

* VILT preprocess

* VIDEOMAE preprocess

* VIDEOLLAVA

* TVP processing

* TVP fixup

* SWIN2SR preprocess

* SIGLIP preprocess

* SAM preprocess

* RT-DETR preprocess

* PVT preprocess

* POOLFORMER preprocess

* PERCEIVER preprocess

* OWLVIT preprocess

* OWLV2 preprocess

* NOUGAT preprocess

* MOBILEVIT preprocess

* MOBILENETV2 preprocess

* MOBILENETV1 preprocess

* LEVIT preprocess

* LAYOUTLMV2 preprocess

* LAYOUTLMV3 preprocess

* Add test

* Update tests

fb66ef81

add the missing flash attention test marker (#32419) · e85d8639

Fanli Lin authored Aug 06, 2024

* add flash attention check

* fix

* fix

* add the missing marker

* bug fix

* add one more

* remove order

* add one more

e85d8639

Llava: fix checkpoint_doc (#32458) · 0aa83282
Prakarsh Kaushik authored Aug 06, 2024
```
fix: add new llava like model bug
```
0aa83282

Cache: create docs (#32150) · 37c5ca5e

Raushan Turganbay authored Aug 06, 2024



* draft

* updates

* works?

* try adding python example in hidden section

* another try

* hwo do i render python

* format as html code?

* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* one more small update

* should render hidden secrtion now

* add outputs

* fix links

* check links

* update all links

* update with offloaded cache

* all cache is importable, so they appear in docs

* fix copies

* docstring...

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

37c5ca5e

05 Aug, 2024 10 commits

Fix documentation links and code reference to model llava-next (#32434) · 13dc6b08
Francisco Kurucz authored Aug 05, 2024

13dc6b08
Respect the config's attn_implementation if set (#32383) · 7e5d46de
amyeroberts authored Aug 05, 2024
```
* Respect the config's attn if set

* Update test - can override in from_config

* Fix
```
7e5d46de
fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413) · 458b0cd2
Sai-Suraj-27 authored Aug 05, 2024
```
Fixed tokenizertests for luke, mluke models.
```
458b0cd2

Persist embedding type of BART and mBART models after resize (#32242) · baf7e5c9

Abdi authored Aug 05, 2024

* fix: persist embedding type of MBartConditonalGeneration after resize

* fix: persist embedding type of BartConditonalGeneration after resize

baf7e5c9

Fix documentation references to google/bit-50 model (#32407) · f5f1e52f
Francisco Kurucz authored Aug 05, 2024

f5f1e52f

add values for neftune (#32399) · ea5da52e

Nicholas Broad authored Aug 05, 2024

I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.

ea5da52e

#32184 save total_vocab_size (#32240) · 3d7c2f9d

Ita Zaporozhets authored Aug 05, 2024

* save total_vocab_size = vocab_size + user added tokens to speed up operation

* updating length when added_tokens_decoder is set

* add test len(tokenizer)

3d7c2f9d

Phi3 tests: fix typing for Python 3.8 (#32388) · 3bb646a5
Raushan Turganbay authored Aug 05, 2024
```
fix phi
```
3bb646a5

fix: SeamlessM4TFeatureExtractor stride remainder (#32088) · 05ae3a30

TechInterMezzo authored Aug 05, 2024

* fix: SeamlessM4TFeatureExtractor stride remainder

* Added attention mask size test

* Reran ruff for style correction

05ae3a30

Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer (#32393) · 847bb856

dependabot[bot] authored Aug 05, 2024

Bump keras in /examples/research_projects/decision_transformer

Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1.
- [Release notes](https://github.com/keras-team/keras/releases)
- [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1

)

---
updated-dependencies:
- dependency-name: keras
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

847bb856

03 Aug, 2024 2 commits

MixtralFlashAttention2: put "plus 1" inside parentheses when calculating... · 621fb3c0

Xueshen Liu authored Aug 03, 2024

MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500)

* Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)

* fix typo [:-1] to [:, -1]

* to meet formatting requirement

* to meet formatting requirement

* remove white space

* MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.

* propagate to startcoder2, phi3, mixtral and qwen2

* update qwen2_moe

621fb3c0

fix: (issue #32124) Exception raised when running... · 7c31d05b

Shaopeng Fu authored Aug 03, 2024

fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157)

fix: Exception raised when running .

7c31d05b

02 Aug, 2024 3 commits
- [generate] only require an attention mask for mps with torch<2.4 (#32367) · c1aa0edb
  Sanchit Gandhi authored Aug 02, 2024
```
* up

* style

* stopping
```
  c1aa0edb
- RoPE: Add numerical tests ✨ (#32380) · 083e13b7
  Joao Gante authored Aug 02, 2024
```
tests! :D
```
  083e13b7
- Update docs (#32368) · 2af199c4
  Raushan Turganbay authored Aug 02, 2024
```
nits
```
  2af199c4
01 Aug, 2024 7 commits

Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299) · 82efc535

Zach Mueller authored Aug 01, 2024



* Test this zach

* Test for improper init w/o zero3

* Move back

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Get rid of stars in warning

* Make private

* Make clear

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

82efc535

Fixed Hybrid Cache Shape Initialization. (#32163) · 51ab25e2

OsamaS99 authored Aug 01, 2024



* fixed hybrid cache init, added test

* Fix Test Typo

---------
Co-authored-by: Aaron Haag <aaron.haag@siemens.com>

51ab25e2

Docker: add `speech` dep to the consistency docker image (#32374) · e3d8285a
Joao Gante authored Aug 01, 2024

e3d8285a

Offloaded KV Cache (#31325) · ca59d6f7

Nikos Karampatziakis authored Aug 01, 2024

* Initial implementation of OffloadedCache

* enable usage via cache_implementation

* Address feedback, add tests, remove legacy methods.

* Remove flash-attn, discover synchronization bugs, fix bugs

* Prevent usage in CPU only mode

* Add a section about offloaded KV cache to the docs

* Fix typos in docs

* Clarifications and better explanation of streams

ca59d6f7

Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233) · b4727a12

Omar Salman authored Aug 01, 2024

* Fix conflicting key in init kwargs in PreTrainedTokenizerBase

* Update code to check for callable key in save_pretrained

* Apply PR suggestions

* Invoke CI

* Updates based on PR suggestion

b4727a12

Empty list in defaults for LLaMA special tokens during weights conversion (#32342) · db8c7cae
Viktor Scherbakov authored Aug 01, 2024
```
empty list in defaults
```
db8c7cae
update clean_up_tokenization_spaces warning (#32371) · 2229ebe7
Ita Zaporozhets authored Aug 01, 2024

2229ebe7