Commits · 73fdc8c5b4509580a047696a4c672a80a8f1710c · chenpangpang / transformers

22 Mar, 2023 16 commits

[deepspeed zero3] need `generate(synced_gpus=True, ...)` (#22242) · 73fdc8c5

Stas Bekman authored Mar 22, 2023



* [deepspeed zero3] need generate(synced_gpus=True, ...)

* fix

* rework per Sylvain's suggestion

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

73fdc8c5

Fix PipelineTests skip conditions (#22320) · 8b05ace0

Yih-Dar authored Mar 22, 2023



* check what tests fail

* Skip failing tests

* Skip failing tests

* Skip failing tests

* Skip failing tests

* clean up

* clean up

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

8b05ace0

Chunkable token classification pipeline (#21771) · d62e7d88

Luc CAILLIAU authored Mar 22, 2023



* Chunkable classification pipeline 

The TokenClassificationPipeline is now able to process sequences longer than 512. No matter the framework, the model, the tokenizer. We just have to pass process_all=True and a stride number (optional). The behavior remains the same if you don't pass these optional parameters. For overlapping parts when using stride above 0, we consider only the max scores for each overlapped token in all chunks where the token is.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* update with latest black format

* update black format

* Update token_classification.py

* Update token_classification.py

* format correction

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update comments

* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update token_classification.py

Correct spaces, remove process_all and keep only stride. If stride is provided, the pipeline is applied to the whole text.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update chunk aggregation

Update the chunk aggregation strategy based on entities aggregation.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

Remove unnecessary pop from outputs dict

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add chunking tests

* correct formating

* correct formatting

* correct model id for test chunking

* update scores with nested simplify

* Update test_pipelines_token_classification.py

* Update test_pipelines_token_classification.py

* update model to a tiny one

* Update test_pipelines_token_classification.py

* Adding smaller test for chunking.

* Fixup

* Update token_classification.py

* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d62e7d88

docs: Resolve incorrect type typo in trainer methods (#22316) · f48d3314
Tom Aarsen authored Mar 22, 2023
```
Resolve incorrect type typo in trainer methods
```
f48d3314

Add Pix2Struct (#21400) · 0f68a7f4

Younes Belkada authored Mar 22, 2023



* v1 all keys match

* clean up

* forward pass ok

* add correct image transform

* generate works, logits matching

* clean up

* more refactor

* revert

* revert

* clean up

* clean ups

* clean up

* refactor

* refactor

* fix doc

* fix tokenizer test

* fix toctree

* revert toctree

* oops

* few fixes

* replace to `pixel_embeds`

* make fixup

* test processing & feat extractor

* fix some tests

* more fixes

* make fixup

* clean up

* more clean up

* add a single slow test

* fix test

* make fixup

* fix

* fix authors

* fix toctree

* update docs

* add docstring

* revert change

* Update src/transformers/models/pix2struct/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer

* fix processor test

* fix test

* make fixup

* refactor

* fix config

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format

* fix

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fixup

* add docstring

* fix issues

* fix

* fix

* fix

* add slow test

* fix

* fix

* fix batched issue

* fix training issues

* fix ci test

* fix slow test

* fix conversion script

* remove unneeded classes

* fix slow test

* fix require backends

* fix masked fill

* revert

* fix softmax

* add large models support

* fix conditional generation

* few fixes

* add instructions

* rm unneeded file

* Update src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py

* fix ci test

* fix ci test really

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix nit

* fix nits

* fix image processors nits

* docstring

* clean up

* fix nit

* fix tests

* docstring nit

* fix reshape

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix nit

* fix repetition

* refactor processor

* make patch size consistent

* refactor forward

* fix docstring

* fix max_patches issue

* update docstirng

* update docstring

* fix coped from

* add skip reasons

* few fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* format

* fix doctests

* refactor and fix

* fix doc build issue

* fix processor test

* small fix conversion script

* replace correct weights

* make fixup

* fix some issues

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert config and fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more details

* fixes

* fix processor

* fix processor test

* fix

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* fix processor

* Update src/transformers/models/pix2struct/modeling_pix2struct.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add copied

* make fixup

* fix copies

* update docstring

* refactor

* fix docstring

* fix conversion script

* fix vqa issue

* replace to `flattened_patches`

* nit

* fix numpy issue

* fix image processors

* add batched vqa support

* fix vqa conversion

* make fixup

* fix conversion script

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add correct docstring

* update docstring

* fix module level + channel dim

* use `make_list_of_images`

* refactor

* correct docstring

* fix authors

* remove `data_format`

* add header text test

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add checkpoints

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

0f68a7f4

Beef up Llama tests (#22314) · fd3eb3e3
Joao Gante authored Mar 22, 2023
```
* tmp commit

* beef up llama tests
```
fd3eb3e3
Generate: Export TF generate with a TF tokenizer (#22310) · 12febc20
Joao Gante authored Mar 22, 2023
```
* Export TF generate with a TF tokenizer

* remove unused lines
```
12febc20
Enforce `max_memory` for device_map strategies (#22311) · 5fd4e3c8
Sylvain Gugger authored Mar 22, 2023
```
Enforce  for device_map strategies
```
5fd4e3c8

Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer (#22302) · 48bef3a7

silentghoul-spec authored Mar 22, 2023



Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list
Co-authored-by: dusejat <dusejat@amazon.com>

48bef3a7

Fix position embeddings for GPT-J and CodeGen (#22069) · 4e94c6c0

Nick Hill authored Mar 22, 2023

* Revert "[GPT-J] add deprecation warning (#21869)"

This reverts commit fb76994c.

* Fix position embeddings for GPT-J and CodeGen

* Address review comments from @gante

* Fix "Copied from" comment referencing wrong function

* Fix copy/paste mistake

* Fix training path

* Hopefully make torch.fx happy

* Move position_ids long cast

* Revert "Hopefully make torch.fx happy"

This reverts commit e41a6f4cad3ff441124c7457b19cfb630d4ca025.

* Changes to help with torch.fx tracing

* Linter fix

* Correct position_ids tensor type hint

* Work-around torch.fx tracing issue

* Get the changes to work with torch.fx

* Address review comment from @michaelbenayoun

* Another small adjustment

* Add explanatory comment; small code tidyup

4e94c6c0

fix: Allow only test_file in pytorch and flax summarization (#22293) · 8e6c34b3
Connor Henderson authored Mar 22, 2023
```
allow only test_file in pytorch and flax summarization
```
8e6c34b3

add low_cpu_mem_usage option in run_clm.py example which will benefit… (#22288) · 4ccaf268

Wang, Yi authored Mar 22, 2023



* add low_cpu_mem_usage option in run_clm.py example which will benefit LLM loading
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update all the example and README under language-modeling
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

4ccaf268

Enable traced model for text-generation task (#22265) · 8472a224
jiqing-feng authored Mar 22, 2023

8472a224
Add MaskedImageModelingOutput (#22212) · 0558914d
Alara Dirik authored Mar 22, 2023
```
* Add MaskedImageModelingOutput
```
0558914d

Final update of doctest (#22299) · 0dcb46e7

Yih-Dar authored Mar 22, 2023



* update

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

0dcb46e7

[deepspeed] offload + non-cpuadam optimizer exception doc (#22044) · 89a0a9ea
Stas Bekman authored Mar 21, 2023
```
* [deepspeed] offload + non-cpuadam optimizer exception doc

* deps
```
89a0a9ea

21 Mar, 2023 8 commits

Correct NATTEN function signatures and force new version (#22298) · 5990743f
Ali Hassani authored Mar 21, 2023

5990743f
Restore fp16 support on xla gpu device (#22300) · d35f7296
Yanming W authored Mar 21, 2023

d35f7296

Time to Say Goodbye, torch 1.7 and 1.8 (#22291) · 67c2dbdb

Yih-Dar authored Mar 21, 2023



* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

67c2dbdb

Add translation perf_infer_gpu_one for it (#22296) · 86c7931a
Davide Gazzè authored Mar 21, 2023
```
Add translation
```
86c7931a

fix more doctests (#22292) · d0b942d1

Yih-Dar authored Mar 21, 2023



* fix more doctests

* fix style

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d0b942d1

More doctests (#22268) · 48327c57

Yih-Dar authored Mar 21, 2023



* all doctests

* Skip failed tests

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

48327c57

Fix error in mixed precision training of `TFCvtModel` (#22267) · 5a2b77a6

Gerald Cuder authored Mar 21, 2023



* Make sure CVT can be trained using mixed precision

* Add test for keras-fit with mixed-precision

* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------
Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

5a2b77a6

replace_8bit_linear modules_to_not_convert default value fix (#22238) · 330d8b99

Andrei Panferov authored Mar 21, 2023



* Fixed modules_to_not_convert default value

* Fixed modules_to_not_convert docstring

* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ["lm_head"] if modules_to_not_convert is None

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

330d8b99

20 Mar, 2023 12 commits
- Update vision docstring bool masked pos (#22237) · c07a02a4
  amyeroberts authored Mar 20, 2023
```
* Add bool_masked_pos to forward docstrings

* Add note about mask ratio - videomae

* Fix up

* Fix indenting
```
  c07a02a4
- Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278) · 7bd86505
  Maria Khalusova authored Mar 20, 2023
```
* added an example of pad_to_multiple_of

* make style

* addressed feedback
```
  7bd86505
- Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph... · fb0a38b4
  Antoni Viros authored Mar 20, 2023
```
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training (#22279)
```
  fb0a38b4
- Fix doc links (#22274) · 8ac29fe0
  amyeroberts authored Mar 20, 2023
  
  8ac29fe0
- Proper map location for optimizer load (#22273) · da005253
  Sylvain Gugger authored Mar 20, 2023
```
* Proper map location for optimizer load

* What happened to my code?
```
  da005253
- Rework a bit the LLaMA conversion script (#22236) · 786092a3
  Sylvain Gugger authored Mar 20, 2023
```
* Update LLaMA conversion script

* Doc

* Fix the weight size for the 13B checkpoint

* Update src/transformers/models/llama/convert_llama_weights_to_hf.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
```
  786092a3
- Fix balanced and auto device_map (#22271) · 43efd7cb
  Sylvain Gugger authored Mar 20, 2023
  
  43efd7cb
- Fix the gradient checkpointing bug of the llama model (#22270) · 89f0fda5
  yqy2001 authored Mar 20, 2023
```
fix grad ckpt bug of llama
```
  89f0fda5
- [Trainer] Add optional communication backends for torch.distributed when using GPU (#22247) · cf0af9a3
  heya5 authored Mar 20, 2023
```
Update training_args.py
```
  cf0af9a3
- Italian translation perf_infer_cpu (#22243) · c4bf6f38
  Nicola Procopio authored Mar 20, 2023
```
* added translated files

added perf_train_cpu and perf_train_cpu_many

* updated toctree

* updated toctree

* added file

perf_infer_cpu.medx

* italian translation perf_infer_cpu.mdx
```
  c4bf6f38
- [Docs] fix typos in some tokenizer docs (#22256) · 466144d4
  yesinkim authored Mar 20, 2023
```
[Docs] fix typos
Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>
```
  466144d4
- Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266) · a48310de
  Pasquale Minervini authored Mar 20, 2023
```
Update training_args.py

A nightly install is not required anymore for `torch.compile`.
```
  a48310de
17 Mar, 2023 4 commits

[trainer] param count for deepspeed zero3 (#22193) · 60d51ef5
Stas Bekman authored Mar 17, 2023
```
[trainer] param count for zero3
```
60d51ef5
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding (#22234) · cf601b90
Guangyuan Ma authored Mar 18, 2023
```
push
```
cf601b90
Revert "Use `dash==2.8.1` for now for daily CI" (#22233) · bec07561
Yih-Dar authored Mar 17, 2023
```
Revert "Use `dash==2.8.1` for now for daily CI (#22227)"

This reverts commit 53218671.
```
bec07561

Fix natten (#22229) · 3028b20a

Ali Hassani authored Mar 17, 2023

* Add kernel size to NATTEN's QK arguments.

The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.

This ends up failing NATTEN tests.

This commit adds NATTEN back to circleci and adds the arguments to get
it working again.

* Force NATTEN >= 0.14.5

3028b20a