Commits · b99f7bd4fce73eeb3ce731ef49d18f0e1c311dd2 · chenpangpang / transformers

25 Jul, 2023 15 commits

[DOCS] add example NoBadWordsLogitsProcessor (#25046) · b99f7bd4
Gema Parreño authored Jul 25, 2023
```
* add example NoBadWordsLogitsProcessor

* fix L764 & L767

* make style
```
b99f7bd4

[`MPT`] Add MosaicML's `MPT` model to transformers (#24629) · dcb183f4

Arthur authored Jul 25, 2023



* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dcb183f4

Fix: repeat per sample for SAM image embeddings (#25074) · 1dbc1440
Xiaoke Huang authored Jul 25, 2023
```
Repeat per sample for SAM image embeddings
```
1dbc1440
🌐 [i18n-KO] Translated `hpo_train.md` to Korean (#24968) · cb8abee5
Harheem Kim authored Jul 25, 2023
```
* dos: ko: hpo_train.mdx

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions
```
cb8abee5

[`generate`] Only warn users if the `generation_config`'s `max_length` is set... · f2c1df93

Arthur authored Jul 25, 2023

[`generate`]  Only warn users if the `generation_config`'s `max_length` is set to the default value (#25030)

* check max length is default

* nit

* update warning: no-longer deprecate

* comment in the configuration_utils in case max length's default gets changed in the futur

f2c1df93

replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size` in readme... · c879318c

Alan Ji authored Jul 25, 2023

replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size` in readme of multiple-choice task (#25078)

replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size`
in readme of multiple-choice

c879318c

Fix broken link in README_hd.md (#25067) · 25e443c0
Susnato Dhar authored Jul 25, 2023
```
Update README_hd.md
```
25e443c0
Set `TF32` flag for PyTorch cuDNN backend (#25075) · 6bc61aa7
Xuehai Pan authored Jul 25, 2023

6bc61aa7
fix: add TOC anchor link (#25066) · 5dba88b2
Injin Paek authored Jul 25, 2023

5dba88b2
Fix last models for common tests that are too big. (#25058) · f295fc8a
Sylvain Gugger authored Jul 25, 2023
```
* Fix last models for common tests that are too big.

* Remove print statement
```
f295fc8a

🌐

[i18n-KO] Translated `perf_hardware.md` to Korean (#24966) · ee1eb3b3

Sangam Lee authored Jul 25, 2023



* docs: ko: perf_hardware.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions
Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: fix rendering error of perf_hardware.md

---------
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

ee1eb3b3

🌐

[i18n-KO] Translated `<tf_xla>.md` to Korean (#24904) · f6fe1d55

Haewon Kim authored Jul 25, 2023

* docs: ko: tf_xla.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

f6fe1d55

[Docs] fix rope_scaling doc string (#25072) · faf25c04
Kashif Rasul authored Jul 25, 2023
```
fix rope_scaling doc string
```
faf25c04
Generate - add beam indices output in contrained beam search (#25042) · c0742b15
Joao Gante authored Jul 25, 2023

c0742b15
[`RWKV`] Add note in doc on `RwkvStoppingCriteria` (#25055) · c53a6eae
Arthur authored Jul 25, 2023
```
* Add note in doc on `RwkvStoppingCriteria`

* give some breathing space to the code
```
c53a6eae

24 Jul, 2023 17 commits

Better error message when signal is not supported on OS (#25049) · d2295708
Sylvain Gugger authored Jul 24, 2023
```
* Better error message when signal is not supported on OS

* Address review comments
```
d2295708

🌐

[i18n-KO] Translated `perf_train_cpu.md` to Korean (#24911) · c0d1c330

seank021 authored Jul 25, 2023



* dos: ko: perf_train_cpu.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* fix: manual edits
Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

---------
Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

c0d1c330

[`8bit`] Fix 8bit corner case with Blip2 8bit (#25047) · b08f41e6
Younes Belkada authored Jul 24, 2023
```
fix 8bit corner case with Blip2 8bit
```
b08f41e6

compute_loss in trainer failing to label shift for PEFT model when label... · 3611fc90

Nate Brake authored Jul 24, 2023


compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. (#25044)

* added PeftModelForCausalLM to MODEL_FOR_CAUSAL_LM_MAPPING_NAMES dict

* check for PEFT model in compute_loss section

---------
Co-authored-by: Nathan Brake <nbrake3@mmm.com>

3611fc90

Pvt model (#24720) · a03d13c8

Rinat authored Jul 24, 2023

* pull and push updates

* add docs

* fix modeling

* Add and run test

* make copies

* add task

* fix tests and fix small issues

* Checks on a Pull Request

* fix docs

* add desc pvt.md

a03d13c8

Comment again print statement · afe8bfc0
Sylvain Gugger authored Jul 24, 2023

afe8bfc0
Make more test models smaller (#25005) · 42571f6e
Sylvain Gugger authored Jul 24, 2023
```
* Make more test models tiny

* Make more test models tiny

* More models

* More models
```
42571f6e
Fix typo in LlamaTokenizerFast docstring example (#25018) · 8f1f0bf5
Sören Brunk authored Jul 24, 2023

8f1f0bf5
Add dispatch_batches to training arguments (#25038) · 3b734f50
Zach Mueller authored Jul 24, 2023
```
* Dispatch batches

* Copy items
```
3b734f50

🌐

[i18n-KO] Translated `testing.md` to Korean (#24900) · 9d2b983e

Sunmin Cho authored Jul 24, 2023

* docs: ko: testing.md

* feat: draft

* fix: manual edits

* fix: edit ko/_toctree.yml

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

9d2b983e

🌐

[i18n-KO] Translated performance.md to Korean (#24883) · 383be1b7

Sangam Lee authored Jul 24, 2023



* dos: ko: performance.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/performance.md
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* Update docs/source/ko/performance.md

---------
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

383be1b7

Better handling missing SYS in llama conversation tokenizer (#24997) · efb2ba66

Iskren Ivov Chernev authored Jul 24, 2023

* Better handling missing SYS in llama conversation tokenizer

The existing code failed to add SYS if the conversation has history
without SYS, but did modify the passed conversation as it did.

Rearrange the code so modification to the conversation object are taken
into account for token id generation.

* Fix formatting with black

* Avoid one-liners

* Also fix fast tokenizer

* Drop List decl

efb2ba66

Support GatedRepoError + use raise from (#25034) · 67049231

Lucain authored Jul 24, 2023



* Support GatedRepoError + use raise from

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Use token instead of use_auth_token in error messages

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

67049231

[docs] Performance docs tidy up, part 1 (#23963) · 75317aef

Maria Khalusova authored Jul 24, 2023



* first pass at the single gpu doc

* overview: improved clarity and navigation

* WIP

* updated intro and deepspeed sections

* improved torch.compile section

* more improvements

* minor improvements

* make style

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* feedback addressed

* mdx -> md

* link fix

* feedback addressed

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

75317aef

fix(integrations): store serialized `TrainingArgs` to `wandb.config` without sanitization. (#25035) · 54ba8608

Bharat Ramanathan authored Jul 24, 2023

fix: store training args to wandb config without sanitization.

Allows resuming runs by reusing the wandb config.
Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>

54ba8608

[`logging.py`] set default `stderr` path if `None` (#25033) · 0906d212
Arthur authored Jul 24, 2023
```
set default logger
```
0906d212
[check_config_docstrings.py] improve diagnostics (#25012) · c9a82be5
Stas Bekman authored Jul 23, 2023
```
* [check_config_docstrings.py] improve diagnostics

* style

* rephrase

* fix
```
c9a82be5

21 Jul, 2023 8 commits
- 🌐 [i18n-KO] Updated Korean `serialization.md` (#24686) · b257c46a
  Wonhyeong Seo authored Jul 22, 2023
```
fix: update ko/serialization.md

* chatgpt draft
```
  b257c46a
- Move template doc file to md (#25004) · 87fba947
  Sylvain Gugger authored Jul 21, 2023
  
  87fba947
- improve from_pretrained for zero3 multi gpus mode (#24964) · ea41e18c
  Ivan Sorokin authored Jul 21, 2023
```
* improve from_pretrained for zero3 multi gpus mode

* Add check if torch.distributed.is_initialized

* Revert torch.distributed

---------
Co-authored-by: Stas Bekman <stas@stason.org>
```
  ea41e18c
- [`Llama`] remove persistent `inv_freq` tensor (#24998) · 95f96b45
  Arthur authored Jul 21, 2023
```
remove persistent tensor
```
  95f96b45
- [`bnb`] Add simple check for bnb import (#24995) · d3ce048c
  Younes Belkada authored Jul 21, 2023
```
add simple check for bnb
```
  d3ce048c
- Fix `llama` tokenization doctest (#24990) · f1a1eb4a
  Yih-Dar authored Jul 21, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f1a1eb4a
- Use main_input_name for include_inputs_for_metrics (#24993) · a7d21318
  Sylvain Gugger authored Jul 21, 2023
  
  a7d21318
- Fix type annotation for deepspeed training arg (#24988) · a6484c89
  Sylvain Gugger authored Jul 21, 2023
  
  a6484c89