Commits · d9dc993fdd6e6b0a61fe68ccbe838e00c73b9f80 · chenpangpang / transformers

28 Mar, 2024 1 commit
- Fix typo in T5Block error message (#29881) · d9dc993f
  Minseo Kang authored Mar 28, 2024
  
  d9dc993f
27 Mar, 2024 9 commits

MixtralSparseMoeBlock: add gate jitter (#29865) · a25037be

Lorenzo Verardo authored Mar 27, 2024

This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.

a25037be

add Cambricon MLUs support (#29627) · 75769744

huismiling authored Mar 27, 2024

* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

75769744

Move `eos_token_id` to stopping criteria (#29459) · 0efcf323

Raushan Turganbay authored Mar 27, 2024



* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0efcf323

fix fuyu device_map compatibility (#29880) · 31c575bc
Marc Sun authored Mar 27, 2024
```
fix foward
```
31c575bc

Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7

Lysandre Debut authored Mar 27, 2024

* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors

4d8427f7

Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813) · a81cf9ee

Hovnatan Karapetyan authored Mar 27, 2024

* Check for requires_grad when initing weights

* Add unit test

* Move sinusoidal positional encoding generation after post_init()

* Add modules to skip init list

* Move create_sinusoidal_embeddings to _init_weights

a81cf9ee

Mamba `slow_forward` gradient fix (#29563) · cefb819f

Anton Vlasjuk authored Mar 27, 2024

* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"

* formatting

* fix: use real `slow_forward` call instead of torch module's

* add shape assertion for mixer block test

* adjust shape assertion

cefb819f

Add Qwen2MoE (#29377) · 1c39974a

Bo Zheng authored Mar 27, 2024



* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1c39974a

Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation (#29557) · 8e08acad
Benjamin Minixhofer authored Mar 27, 2024
```
* fix tinyllama flax modelling

* rename vars to minimize changes

* move

* formatting

* remove unused var
```
8e08acad

26 Mar, 2024 4 commits

Add `cosine_with_min_lr` scheduler in Trainer (#29341) · ef609958
Yanyi Liu authored Mar 26, 2024
```
* Add cosine_with_min_lr scheduler

* Update error message for missing min_lr or min_lr_rate
```
ef609958
Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772) · 998b5bb5
Zhihao Lin authored Mar 26, 2024
```
* update

* add ut

* update
```
998b5bb5

Replace 'decord' with 'av' in VideoClassificationPipeline (#29747) · b32bf85b

yunxiangtang authored Mar 26, 2024



* replace the 'decord' with 'av' in VideoClassificationPipeline

* fix the check of backend in VideoClassificationPipeline

* adjust the order of imports

* format 'video_classification.py'

* format 'video_classification.py' with ruff

---------
Co-authored-by: wanqiancheng <13541261013@163.com>

b32bf85b

Add warnings if training args differ from checkpoint trainer state (#29255) · b5a6d6ee

Jonathan Flynn authored Mar 26, 2024



* add warnings if training args differ from checkpoint args stored in trainer_state.json

* run formatting and styling

* add a test

* format and styling

---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>

b5a6d6ee

25 Mar, 2024 5 commits

[`revert commit`] revert 00a09ed4 · e3e16ddc
Arthur Zucker authored Mar 25, 2024

e3e16ddc
fix 😭 · 00a09ed4
Arthur Zucker authored Mar 25, 2024

00a09ed4

Populate torch_dtype from model to pipeline (#28940) · 8e9a2207

Yuki Watanabe authored Mar 25, 2024



* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

8e9a2207

Fix the behavior of collecting 'num_input_tokens_seen' (#29099) · afe73aed

yhuang authored Mar 25, 2024

fix the behavior of collecting 'num_input_tokens_seen'

See https://github.com/huggingface/transformers/issues/28791 for more details.

afe73aed

Remove static pretrained maps from the library's internals (#29112) · 39114c03

Lysandre Debut authored Mar 25, 2024



* [test_all] Remove static pretrained maps from the library's internals

* Deprecate archive maps instead of removing them

* Revert init changes

* [test_all] Deprecate instead of removing

* [test_all] PVT v2 support

* [test_all] Tests should all pass

* [test_all] Style

* Address review comments

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] trigger tests

* [test_all] LLAVA

* [test_all] Bad rebase

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

39114c03

22 Mar, 2024 7 commits
- [`SuperPoint`] Fix doc example (#29816) · c5f0288b
  amyeroberts authored Mar 22, 2024
```
[SuperPoint] Fix doc example
```
  c5f0288b
- [`cleanup`] vestiges of causal mask (#29806) · 2e7cb46f
  Arthur authored Mar 22, 2024
```
nit
```
  2e7cb46f
- replaced concatenation to f-strings to improve readability and unify … (#29785) · 884b2215
  igeni authored Mar 22, 2024
```
replaced concatenation to f-strings to improve readability and unify with the rest code
```
  884b2215
- Generate: remove unused attributes in `AssistedCandidateGenerator` (#29787) · 34e07f4b
  Joao Gante authored Mar 22, 2024
```
remove unused attrs
```
  34e07f4b
- rm input dtype change in CPU (#28631) · e85654f5
  jiqing-feng authored Mar 22, 2024
```
* rm input dtype change in CPU

* add warning when use CPU low-precision

* rm useless logging
```
  e85654f5
- Correct llava mask & fix missing setter for `vocab_size` (#29389) · 13b23704
  fxmarty authored Mar 22, 2024
```
* correct llava mask

* fix vipllava as wlel

* mask out embedding for padding tokens

* add test

* fix style

* add setter

* fix test on suggestion
```
  13b23704
- Fix type hint for train_dataset param of Trainer.__init__() to allow... · 34791613
  Steven Madere authored Mar 22, 2024
```
Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset.  Issue 29678 (#29738)

* Fixed typehint for train_dataset param in Trainer.__init__().  Added IterableDataset option.

* make fixup
```
  34791613
21 Mar, 2024 12 commits

Change in-place operations to out-of-place in LogitsProcessors (#29680) · fadb0533

Raushan Turganbay authored Mar 21, 2024



* change in-place -> out-of-place

* add tests

* add more tests

* naming consistency

* fix doctest

* forgot min-length processors

* empty

* Revert "fix doctest"

This reverts commit 4772768457f9bc057f1d4d9d67ea94eb7224eb8d.

* revert change in docstring

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fadb0533

Prepend `bos token` to Blip generations (#29642) · b469ebc5

Raushan Turganbay authored Mar 21, 2024



* prepend "bos" to blip generation

* minor changes

* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/instructblip/modeling_instructblip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add generation tester mixin

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b469ebc5

Llama: always convert the causal mask in the SDPA code path (#29663) · ee38fc31
Joao Gante authored Mar 21, 2024
```
* always convert the mask

* rebase and fix copies
```
ee38fc31
Generate: remove legacy generation mixin imports (#29782) · 5ffef2a9
Joao Gante authored Mar 21, 2024

5ffef2a9
Add deterministic config to `set_seed` (#29778) · 10d232e8
Zach Mueller authored Mar 21, 2024
```
* Add deterministic config

* Add note on slowdown

* English fails me again
```
10d232e8
Silence deprecations and use the DataLoaderConfig (#29779) · f0bfb150
Zach Mueller authored Mar 21, 2024
```
* Remove deprecations

* Clean
```
f0bfb150
Cast bfloat16 to float32 for Numpy conversions (#29755) · de627f5a
Matt authored Mar 21, 2024
```
* Cast bfloat16 to float32 for Numpy conversions

* Add test
```
de627f5a
[`LlavaNext`] Fix llava next unsafe imports (#29773) · 73a73b41
Arthur authored Mar 21, 2024
```
* path llava-next

* styling

* styling
```
73a73b41
fix issue with logit processor during beam search in Flax (#29636) · fd734be1
théo gigant authored Mar 21, 2024
```
fix issue with logit processor in beam search in Flax
```
fd734be1

Allow `-OO` mode for `docstring_decorator` (#29689) · 691c3d73

Matthias Dittrich authored Mar 21, 2024

Fixes
```
  File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 987, in <module>
    class AutoConfig:
  File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1011, in AutoConfig
    @replace_list_option_in_docstrings()
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 966, in docstring_decorator
    lines = docstrings.split("\n")
            ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
```

691c3d73

OWL-ViT box_predictor inefficiency issue (#29712) · 9556054f

Rahul Vinod Vishwakarma authored Mar 21, 2024



* Calculating box_bias at the start once, then reusing it at inference

* Updating the compute_box_bias function for backwards compatibility

* Caching compute_box_bias function

* Bux fix

* Update owlv2 accordingly to ensure repo consistency

* Co-authored by: nvbinh15 <binh.pdc01@gmail.com>

* Fixup changes

* Made copied code consistent

* Co-authored by: nvbinh15 <binh.pdc01@gmail.com>

---------

Co-authored-by: Nguyen Van Binh <>
Co-authored-by: Nguyen Van Binh <binh.pdc01@gmail.com>

9556054f

Fixed typo in quantization_config.py (#29766) · 0639034a

Ash Kuroki authored Mar 21, 2024

Update quantization_config.py

Fixed typo for clarity and correctness.

previous: input time
current: input type
// changed time to type to fix the typo

0639034a

20 Mar, 2024 2 commits

[`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900

Arthur authored Mar 21, 2024

* attempt to fix

* the actual fix that works with compilation!

* this?

* temporary update

* nit?

* dispatcg to memory efficient?

* update both models that have static cache support

* fix copies fix compile

* make sure fix

* fix cohere and gemma

* fix beams?

* nit

* slipped through the cracks

* nit

* nits

* update

* fix-copies

* skip failing tests

* nits

ff841900

[`BitsAndBytesConfig`] Warning for unused `kwargs` & safety checkers for... · 8dd4ce6f

Benjamin Ye authored Mar 20, 2024


[`BitsAndBytesConfig`] Warning for unused `kwargs` & safety checkers for `load_in_4bit` and `load_in_8bit` (#29761)

* added safety checkers for load_in_4bit and load_in_8bit on init, as well as their setters

* Update src/transformers/utils/quantization_config.py

typo correction for load_in_8bit setter checks
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

8dd4ce6f