Commits · 5fd4e3c87c685fba2dd9615be62131748a8b5ee3 · chenpangpang / transformers

22 Mar, 2023 9 commits

Enforce `max_memory` for device_map strategies (#22311) · 5fd4e3c8
Sylvain Gugger authored Mar 22, 2023
```
Enforce  for device_map strategies
```
5fd4e3c8

Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer (#22302) · 48bef3a7

silentghoul-spec authored Mar 22, 2023



Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list
Co-authored-by: dusejat <dusejat@amazon.com>

48bef3a7

Fix position embeddings for GPT-J and CodeGen (#22069) · 4e94c6c0

Nick Hill authored Mar 22, 2023

* Revert "[GPT-J] add deprecation warning (#21869)"

This reverts commit fb76994c.

* Fix position embeddings for GPT-J and CodeGen

* Address review comments from @gante

* Fix "Copied from" comment referencing wrong function

* Fix copy/paste mistake

* Fix training path

* Hopefully make torch.fx happy

* Move position_ids long cast

* Revert "Hopefully make torch.fx happy"

This reverts commit e41a6f4cad3ff441124c7457b19cfb630d4ca025.

* Changes to help with torch.fx tracing

* Linter fix

* Correct position_ids tensor type hint

* Work-around torch.fx tracing issue

* Get the changes to work with torch.fx

* Address review comment from @michaelbenayoun

* Another small adjustment

* Add explanatory comment; small code tidyup

4e94c6c0

fix: Allow only test_file in pytorch and flax summarization (#22293) · 8e6c34b3
Connor Henderson authored Mar 22, 2023
```
allow only test_file in pytorch and flax summarization
```
8e6c34b3

add low_cpu_mem_usage option in run_clm.py example which will benefit… (#22288) · 4ccaf268

Wang, Yi authored Mar 22, 2023



* add low_cpu_mem_usage option in run_clm.py example which will benefit LLM loading
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update all the example and README under language-modeling
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

4ccaf268

Enable traced model for text-generation task (#22265) · 8472a224
jiqing-feng authored Mar 22, 2023

8472a224
Add MaskedImageModelingOutput (#22212) · 0558914d
Alara Dirik authored Mar 22, 2023
```
* Add MaskedImageModelingOutput
```
0558914d

Final update of doctest (#22299) · 0dcb46e7

Yih-Dar authored Mar 22, 2023



* update

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

0dcb46e7

[deepspeed] offload + non-cpuadam optimizer exception doc (#22044) · 89a0a9ea
Stas Bekman authored Mar 21, 2023
```
* [deepspeed] offload + non-cpuadam optimizer exception doc

* deps
```
89a0a9ea

21 Mar, 2023 8 commits

Correct NATTEN function signatures and force new version (#22298) · 5990743f
Ali Hassani authored Mar 21, 2023

5990743f
Restore fp16 support on xla gpu device (#22300) · d35f7296
Yanming W authored Mar 21, 2023

d35f7296

Time to Say Goodbye, torch 1.7 and 1.8 (#22291) · 67c2dbdb

Yih-Dar authored Mar 21, 2023



* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

67c2dbdb

Add translation perf_infer_gpu_one for it (#22296) · 86c7931a
Davide Gazzè authored Mar 21, 2023
```
Add translation
```
86c7931a

fix more doctests (#22292) · d0b942d1

Yih-Dar authored Mar 21, 2023



* fix more doctests

* fix style

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d0b942d1

More doctests (#22268) · 48327c57

Yih-Dar authored Mar 21, 2023



* all doctests

* Skip failed tests

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

48327c57

Fix error in mixed precision training of `TFCvtModel` (#22267) · 5a2b77a6

Gerald Cuder authored Mar 21, 2023



* Make sure CVT can be trained using mixed precision

* Add test for keras-fit with mixed-precision

* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------
Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

5a2b77a6

replace_8bit_linear modules_to_not_convert default value fix (#22238) · 330d8b99

Andrei Panferov authored Mar 21, 2023



* Fixed modules_to_not_convert default value

* Fixed modules_to_not_convert docstring

* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ["lm_head"] if modules_to_not_convert is None

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

330d8b99

20 Mar, 2023 12 commits
- Update vision docstring bool masked pos (#22237) · c07a02a4
  amyeroberts authored Mar 20, 2023
```
* Add bool_masked_pos to forward docstrings

* Add note about mask ratio - videomae

* Fix up

* Fix indenting
```
  c07a02a4
- Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278) · 7bd86505
  Maria Khalusova authored Mar 20, 2023
```
* added an example of pad_to_multiple_of

* make style

* addressed feedback
```
  7bd86505
- Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph... · fb0a38b4
  Antoni Viros authored Mar 20, 2023
```
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training (#22279)
```
  fb0a38b4
- Fix doc links (#22274) · 8ac29fe0
  amyeroberts authored Mar 20, 2023
  
  8ac29fe0
- Proper map location for optimizer load (#22273) · da005253
  Sylvain Gugger authored Mar 20, 2023
```
* Proper map location for optimizer load

* What happened to my code?
```
  da005253
- Rework a bit the LLaMA conversion script (#22236) · 786092a3
  Sylvain Gugger authored Mar 20, 2023
```
* Update LLaMA conversion script

* Doc

* Fix the weight size for the 13B checkpoint

* Update src/transformers/models/llama/convert_llama_weights_to_hf.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
```
  786092a3
- Fix balanced and auto device_map (#22271) · 43efd7cb
  Sylvain Gugger authored Mar 20, 2023
  
  43efd7cb
- Fix the gradient checkpointing bug of the llama model (#22270) · 89f0fda5
  yqy2001 authored Mar 20, 2023
```
fix grad ckpt bug of llama
```
  89f0fda5
- [Trainer] Add optional communication backends for torch.distributed when using GPU (#22247) · cf0af9a3
  heya5 authored Mar 20, 2023
```
Update training_args.py
```
  cf0af9a3
- Italian translation perf_infer_cpu (#22243) · c4bf6f38
  Nicola Procopio authored Mar 20, 2023
```
* added translated files

added perf_train_cpu and perf_train_cpu_many

* updated toctree

* updated toctree

* added file

perf_infer_cpu.medx

* italian translation perf_infer_cpu.mdx
```
  c4bf6f38
- [Docs] fix typos in some tokenizer docs (#22256) · 466144d4
  yesinkim authored Mar 20, 2023
```
[Docs] fix typos
Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>
```
  466144d4
- Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266) · a48310de
  Pasquale Minervini authored Mar 20, 2023
```
Update training_args.py

A nightly install is not required anymore for `torch.compile`.
```
  a48310de
17 Mar, 2023 11 commits

[trainer] param count for deepspeed zero3 (#22193) · 60d51ef5
Stas Bekman authored Mar 17, 2023
```
[trainer] param count for zero3
```
60d51ef5
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding (#22234) · cf601b90
Guangyuan Ma authored Mar 18, 2023
```
push
```
cf601b90
Revert "Use `dash==2.8.1` for now for daily CI" (#22233) · bec07561
Yih-Dar authored Mar 17, 2023
```
Revert "Use `dash==2.8.1` for now for daily CI (#22227)"

This reverts commit 53218671.
```
bec07561

Fix natten (#22229) · 3028b20a

Ali Hassani authored Mar 17, 2023

* Add kernel size to NATTEN's QK arguments.

The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.

This ends up failing NATTEN tests.

This commit adds NATTEN back to circleci and adds the arguments to get
it working again.

* Force NATTEN >= 0.14.5

3028b20a

fix(docs): fix task guide links in model docs (#22226) · 074490b2
Seb0 authored Mar 17, 2023
```
fix(docs): task guide links in model docs
```
074490b2
Removed .mdx extension in two links (#22230) · 314cdf7c
Maria Khalusova authored Mar 17, 2023
```
removed .mdx extension
```
314cdf7c

Add LlamaForSequenceClassification (#22209) · f2514413

lewtun authored Mar 17, 2023



* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

f2514413

fix AutoTP in deepspeed could not work for bloom (#22196) · 675d2a5a

Wang, Yi authored Mar 17, 2023



* fix AutoTP in deepspeed could not work for bloom
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add a method in BloomModel to build ailib
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

675d2a5a

LLaMA house-keeping (#22216) · 00934026
Sylvain Gugger authored Mar 17, 2023
```
* LLaMA house-keeping

* Doc links
```
00934026

Depth estimation task guide (#22205) · 42f8f764

Maria Khalusova authored Mar 17, 2023

* added doc to toc, auto tip with  supported models, mention of task guide in model docs

* make style

* removed "see also"

* minor fix

42f8f764

Use `dash==2.8.1` for now for daily CI (#22227) · 53218671
Yih-Dar authored Mar 17, 2023
```
Use dash 2.8.1 for now
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
53218671