Commits · 594c1277b2fcc1c1aed252d320359101409e0407 · chenpangpang / transformers

21 Feb, 2024 6 commits

[ `gemma`] Adds support for Gemma

(#29167) · 594c1277

Arthur authored Feb 21, 2024

* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗



* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀



* update all readmes

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>

594c1277

[`Maskformer`] safely get backbone config (#29166) · 58245ba6
amyeroberts authored Feb 21, 2024
```
Safe getattr
```
58245ba6

support SDPA Attention in stablelm (#29106) · 1d0ea7ab

Ekaterina Aidova authored Feb 21, 2024



* support SDPA Attention in stablelm

* add integration test

* add fallback for output_attentions

* Update src/transformers/models/stablelm/modeling_stablelm.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/stablelm/test_modeling_stablelm.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/stablelm/modeling_stablelm.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* handle non-contiguous states

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

1d0ea7ab

`torch.compile` compatibility with `generate` + static cache (#29114) · cc4a664b

fxmarty authored Feb 21, 2024



* fix compatibility

* working version

* cleanup

* sanity checks

* more sanity

* working version WITH refactor

* working without API change

* cleanup & tests pass

* more cleaning

* fix test

* fix tests

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* smaller comment

* update comment

* update comment

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

cc4a664b

🚨 Llama: update rope scaling to match static cache changes (#29143) · 3994fa5b
Joao Gante authored Feb 21, 2024

3994fa5b
v4.39.dev.0 · 1a77f07f
Arthur Zucker authored Feb 21, 2024

1a77f07f

20 Feb, 2024 20 commits

[`pipeline`] Add pool option to image feature extraction pipeline (#28985) · e770f031
amyeroberts authored Feb 20, 2024
```
* Add pool option

* PR comments - error message and exact outputs check
```
e770f031
Fix drop path being ignored in DINOv2 (#29147) · c47576ca
Fernando Pérez-García authored Feb 20, 2024
```
Fix drop path not being used
```
c47576ca
Added image_captioning version in es and included in toctree file (#29104) · 3c00b885
Gustavo Isturiz authored Feb 20, 2024
```
added image_captioning version in es and included in toctree file
```
3c00b885
Generate: missing generation config eos token setting in encoder-decoder tests (#29146) · 857fd8ea
Joao Gante authored Feb 20, 2024

857fd8ea

Raise unused kwargs image processor (#29063) · 1c81132e

Pablo Montalvo authored Feb 20, 2024

* draft processor arg capture

* add missing vivit model

* add new common test for image preprocess signature

* fix quality

* fix up

* add back missing validations

* quality

* move info level to warning for unused kwargs

1c81132e

[Phi] Add support for sdpa (#29108) · b8b16475
JB (Don) authored Feb 20, 2024

b8b16475
Save (circleci) cache at the end of a job (#29141) · 7688d8df
Yih-Dar authored Feb 20, 2024
```
nice job
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
7688d8df
Add support for fine-tuning CLIP-like models using contrastive-image-text example (#29070) · ee3af60b
Taylor Jackle Spriggs authored Feb 20, 2024
```
* add support for siglip and chinese-clip model training with contrastive-image-text example

* codebase fixups
```
ee3af60b

Revert low cpu mem tie weights (#29135) · 0996a100

amyeroberts authored Feb 20, 2024

* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)"

This reverts commit 725f4ad1.

* Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)"

This reverts commit 4156f517.

0996a100

[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues (#28010) · 15cfe389

Arthur authored Feb 20, 2024

* add add_dummy_prefix_space option to slow

* checking kwargs might be better. Should be there for all spm tokenizer IMO

* nits

* fix copies

* more copied

* nits

* add prefix space

* nit

* nits

* Update src/transformers/convert_slow_tokenizer.py

* fix inti

* revert wrong styling

* fix

* nits

* style

* updates

* make sure we use slow tokenizer for conversion instead of looking for the decoder

* support llama ast well

* update llama tokenizer fast

* nits

* nits nits nits

* update the doc

* update

* update to fix tests

* skip unrelated tailing test

* Update src/transformers/convert_slow_tokenizer.py

* add proper testing

* test decode as well

* more testing

* format

* fix llama test

* Apply suggestions from code review

15cfe389

FIX [`PEFT` / `Trainer` ] Handle better peft + quantized compiled models (#29055) · efdd4366
Younes Belkada authored Feb 20, 2024
```
* handle peft + compiled models

* add tests

* fixup

* adapt from suggestions

* clarify comment
```
efdd4366

[`cuda kernels`] only compile them when initializing (#29133) · 5e95dcab

Arthur authored Feb 20, 2024

* only compile when needed

* fix mra as well

* fix yoso as well

* update

* rempve comment

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

* opps

* Update src/transformers/models/deta/modeling_deta.py

* nit

5e95dcab

Generate: unset GenerationConfig parameters do not raise warning (#29119) · a7755d24
Joao Gante authored Feb 20, 2024

a7755d24
Llama: fix batched generation (#29109) · 7d312ad2
Joao Gante authored Feb 20, 2024

7d312ad2
FIX [`bnb` / `tests`] Propagate the changes from #29092 to 4-bit tests (#29122) · ff76e7c2
Younes Belkada authored Feb 20, 2024
```
* forgot to push the changes for 4bit ..

* trigger CI
```
ff76e7c2

Abstract image processor arg checks. (#28843) · 1c9134f0

Pablo Montalvo authored Feb 20, 2024



* abstract image processor arg checks.

* fix signatures and quality

* add validate_ method to rescale-prone processors

* add more validations

* quality

* quality

* fix formatting
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix formatting
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix formatting
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix formatting mishap
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix crop_size compatibility

* fix default mutable arg

* fix segmentation map + image arg validity

* remove segmentation check from arg validation

* fix quality

* fix missing segmap

* protect PILImageResampling type

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add back segmentation maps check

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1c9134f0

FEAT [`Trainer` / `bnb`]: Add RMSProp from `bitsandbytes` to HF `Trainer` (#29082) · f7ef7cec

Younes Belkada authored Feb 20, 2024



* add RMSProp to Trainer

* revert some change

* Update src/transformers/trainer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f7ef7cec

Move misplaced line (#29117) · a7ff2f23
Erich Schubert authored Feb 20, 2024
```
Move misplaced line, improve code comment
```
a7ff2f23
[`gradient_checkpointing`] default to use it for torch 2.3 (#28538) · 9094abe8
Arthur authored Feb 20, 2024
```
* default to use it

* style
```
9094abe8

Fixed nll with label_smoothing to just nll (#28708) · 49c0b293

Nilesh authored Feb 20, 2024

* Fixed nll with label_smoothing to nll

* Resolved conflict by rebase

* Fixed nll with label_smoothing to nll

* Resolved conflict by rebase

* Added label_smoothing to config file

* Fixed nits

49c0b293

19 Feb, 2024 11 commits

storing & logging gradient norm in trainer (#27326) · 4f09d0fd
Shijie Wu authored Feb 19, 2024
```
* report grad_norm during training

* support getting grad_norm from deepspeed
```
4f09d0fd
Fix two tiny typos in `pipelines/base.py::Pipeline::_sanitize_parameters()`'s docstring (#29102) · a4851d94
Sadra Barikbin authored Feb 19, 2024
```
* Update base.py

* Fix a typo
```
a4851d94

Bnb test fix for different hardwares (#29066) · 5ce90f32

Titus authored Feb 19, 2024



* generated text on A10G

* generated text in CI

* Apply suggestions from code review

add explanatory comments
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

5ce90f32

ENH: added new output_logits option to generate function (#28667) · 08cd694e

Max Baak authored Feb 19, 2024

output_logits option behaves like output_scores, but returns the raw, unprocessed prediction logit scores,
ie. the values before they undergo logit processing and/or warping. The latter happens by default for the
regular output scores.

It's useful to have the unprocessed logit scores in certain circumstances. For example, unprocessed logit scores
are very useful with causallm models when one wants to determine the probability of a certain answer, e.g.
when asking a question with a yes/no answer. In that case getting the next-token probabilities of both "yes" and
"no" (and/or their relative ratio) is of interest for classification. The reason for getting these _before_ logit
processing and/or warping is b/c a) that can change the probabilities or b) reject the tokens of interest / reduce
the number of tokens to just 1.

For an example use-case see paper TabLLM: Few-shot Classification of Tabular Data with Large Language Models
by Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag.
https://arxiv.org/abs/2210.10723



In addition:
- added dedicated unit test: tests/generation/test_utils/test_return_unprocessed_logit_scores
  which tests return of logics with output_logits=True in generation.
- set output_logits=True in all other generation unit tests, that also have output_scores=True.

Implemented @gante's and @amyeroberts review feedback
Co-authored-by: kx79wq <max.baak@ing.com>

08cd694e

[Docs] Add resources (#28705) · 07e3454f

NielsRogge authored Feb 19, 2024



* Add resource

* Add more resources

* Add resources

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove mention

* Remove pipeline tags

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

07e3454f

change version (#29097) · b2724d7b

Arthur authored Feb 19, 2024



* change version

* nuke

* this doesn't make sense

* update some requirements.py

* revert + no main

* nits

* change cache number

* more pin

* revert

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

b2724d7b

Fix a typo in `examples/pytorch/text-classification/run_classification.py` (#29072) · 79132d4c
Jay Zhou authored Feb 19, 2024

79132d4c
Fix the `bert-base-cased` tokenizer configuration test (#29105) · 98308586
Lysandre Debut authored Feb 19, 2024
```
Fix test
```
98308586

fix the post-processing link (#29091) · 593230f0

Winton Davies authored Feb 19, 2024

The link in evaluation was missing a hyphen between post and processing. I fixed this, for English only. Someone with the ability to do a global search/replace should fix the other languages (if indeed they have this issue)/

593230f0

FIX [`bnb` / `tests`]: Fix currently failing bnb tests (#29092) · a75a6c93
Younes Belkada authored Feb 19, 2024
```
Update test_mixed_int8.py
```
a75a6c93

[`Awq`] Add peft support for AWQ (#28987) · 864c8e6e

Younes Belkada authored Feb 19, 2024



* add peft support for AWQ

* Update src/transformers/quantizers/quantizer_awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

864c8e6e

16 Feb, 2024 3 commits

[Docs] Spanish translation of task_summary.md (#28844) · ce4fff0b

Aaron Jimenez authored Feb 16, 2024

* Add task_summary to es/_toctree.yml

* Add task_summary.md to docs/es

* Change title of task_summary.md

* Translate firsts paragraphs

* Translate middle paragraphs

* Translte the rest of the doc

* Edit firts paragraph

ce4fff0b

Add chat support to text generation pipeline (#28945) · 2f1003be

Matt authored Feb 16, 2024

* Add chat support to text generation pipeline

* Better handling of single elements

* Deprecate ConversationalPipeline

* stash commit

* Add missing add_special_tokens kwarg

* Update chat templating docs to refer to TextGenerationPipeline instead of ConversationalPipeline

* Add ✨TF✨

 tests

* @require_tf

* Add type hint

* Add specific deprecation version

* Remove unnecessary do_sample

* Remove todo - the discrepancy has been resolved

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/pipelines/text_generation.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

2f1003be

Fix trainer test wrt DeepSpeed + auto_find_bs (#29061) · 636b0324

Zach Mueller authored Feb 16, 2024



* FIx trainer test

* Update tests/trainer/test_trainer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

636b0324