Commits · be9438ed43dc4027d3c59af8405e3996ad3d9825 · chenpangpang / transformers

12 Sep, 2023 6 commits

[i18n-KO] Translated `llama2.md` to Korean (#26047) · be9438ed

MinJae Kang authored Sep 13, 2023



* docs: ko-llama2.md

* feat: chatGPT draft and manul edits

* feat: added inline TOC

* fix: inline TOC

* fix: resolve suggestions
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: resolve suggestion
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: resolve suggestion
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

---------
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

be9438ed

Fix ExponentialDecayLengthPenalty negative logits issue (#25594) · 6acc27ee

pokjay authored Sep 12, 2023



* Fix issues in test_exponential_decay_length_penalty

Fix tests which were broken and add validation of negative scores.

Current test didn't take into account that ExponentialDecayLengthPenalty updates the score inplace, resulting in updates to base tested Tensor.

In addition, the gt assert had empty Tensors due to indexing along the batch dimension.

Test is currently expected to fail to show ExponentialDecayLengthPenalty issues with negative scores

* Fix ExponentialDecayLengthPenalty negative logits issue

In cases where the scores are negative, ExponentialDecayLengthPenalty decreases the score of eos_token_id instead of increasing it.
To fix this issue we compute the penalty of the absolute value and add it to the original score.

* Add examples for ExponentialDecayLengthPenalty

* Fix styling issue in ExponentialDecayLengthPenalty doc

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Style and quality fix

* Fix example outputs

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

6acc27ee

Update logits_process.py docstrings (#25971) · d65c4a4f
larekrow authored Sep 12, 2023

d65c4a4f
Generate: legacy mode is only triggered when `generation_config` is untouched (#25962) · 3319eb54
Joao Gante authored Sep 12, 2023

3319eb54
[`core`] Import tensorflow inside relevant methods in `trainer_utils` (#26106) · 18abc756
Younes Belkada authored Sep 12, 2023
```
import tensorflow inside relevant methods in trainer_utils
```
18abc756

[`Persimmon`] Add support for persimmon (#26042) · 9cccb3a8

Arthur authored Sep 12, 2023



* intiial commit

* updates

* nits

* update conversion script

* update conversion script

* use path to load

* add tips etc

* some modeling logic

* modeling update

* more nits

* nits

* normal layer norm

* update config and doc

* nits

* update doc remove unused

* update

* fix inits and stuff

* fixup

* revert wrong changes

* updates

* more nits

* add default config values to the configuration file

* fixup happy

* update

* 2 tests left

* update readmes

* more nits

* slow test and more documentation

* update readme

* fix licences

* styling

* use fast if possible when saving tokenizer

* remove todo

* remove tokenization tests

* small last nits

* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* nits to skip the timout doctest

* fix integration test

* fix test

* update eos token

* update to allow fast tokenization

* styling

* fix codeLlama as well for the update post processor

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more copied from statements

* update

* doc passes doctest

* remove `# final layer norm?`

* change docstring prompot

* update

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* don't doctest the conversion script as it requires more packages

* don't init a model in the config

* oups

* fix doctest

---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9cccb3a8

11 Sep, 2023 4 commits
- docs: add space to docs (#26067) · 5af2c626
  Phuc Van Phan authored Sep 12, 2023
```
* docs: add space to docs

* docs: remove reduntant space
```
  5af2c626
- [Core] Add lazy import structure to imports (#26090) · ce2e7ef3
  Patrick von Platen authored Sep 11, 2023
```
* improve import time

* Update src/transformers/integrations/__init__.py

* sort import
```
  ce2e7ef3
- docs: update link huggingface map (#26077) · 9cebae64
  Phuc Van Phan authored Sep 11, 2023
  
  9cebae64
- only main process should call _save on deepspeed zero3 (#25959) · 7fd2d686
  Hang authored Sep 11, 2023
```
only main process should call _save when deepspeed zero3
```
  7fd2d686
09 Sep, 2023 1 commit
- [`CITests`] skip failing tests until #26054 is merged (#26063) · 95b37495
  Arthur authored Sep 08, 2023
```
* skip failing tests until #26054 is merged

* fixup
```
  95b37495
08 Sep, 2023 5 commits

[`CodeLlamaTokenizerFast`] Fix fix `set_infilling_processor` to properly reset (#26041) · 09b2de6e

Arthur authored Sep 08, 2023

* fix `set_infilling_processor` to properly reset

* Add docstring!

* fixups

* more details in the docuemtation about the tokenization

* styl;e

09b2de6e

🌐 [i18n-KO] Translated `llama.md` to Korean (#26044) · d5360603
Harheem Kim authored Sep 09, 2023
```
* docs: ko-llama.md

* fix: chatgpt draft

* feat: manual edits

* fix: resolve suggestions
```
d5360603

Skip warning if tracing with dynamo (#25581) · 6c26faa1

Angela Yi authored Sep 08, 2023

* Ignore warning if tracing with dynamo

* fix import error

* separate to function

* add test

6c26faa1

Update missing docs on `activation_dropout` and fix DropOut docs for SEW-D (#26031) · 18ee1fe7
Thien Tran authored Sep 08, 2023
```
* add missing doc for activation dropout

* fix doc for SEW-D dropout

* deprecate hidden_dropout for SEW-D
```
18ee1fe7

Fix Dropout Implementation in Graphormer (#24817) · 0c67a72c

Alexander Krauck authored Sep 08, 2023

This commit corrects the dropout implementation in Graphormer, aligning it with the original implementation and improving performance. Specifically:

1. The `attention_dropout` variable, intended for use in GraphormerMultiheadAttention, was defined but not used. This has been corrected to use `attention_dropout` instead of the regular `dropout`.
2. The `activation_dropout` for the activations in the feed-forward layers was missing. Instead, the regular `dropout` was used. This commit adds `activation_dropout` to the feed-forward layers.

These changes ensure the dropout implementation matches the original Graphormer and delivers empirically better performance.

0c67a72c

07 Sep, 2023 9 commits

Try to fix training Loss inconsistent after resume from old checkpoint (#25872) · fb7d2469

dumpmemory authored Sep 08, 2023



* fix loss inconsistent after resume  #25340

* fix typo

* clean code

* reformatted code

* adjust code according to comments

* adjust check_dataloader_randomsampler location

* return sampler only

* handle sampler is None

* Update src/transformers/trainer_pt_utils.py

thanks @amyeroberts
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fb7d2469

Punctuation fix (#26025) · c5e66a40
MyungHa Kwon authored Sep 08, 2023
```
fix typo
```
c5e66a40
Fix vilt config docstring parameter to match value in init (#26017) · 00efd64e
raghavanone authored Sep 08, 2023
```
* Fix vilt config init parameter to match the ones in documentation

* Fix the documentation
```
00efd64e

Added HerBERT to README.md (#26020) · 02c4a77f

Muskan Kumar authored Sep 07, 2023

* Added HerBERT to README.md

* Update README.md to contain HerBERT (#26016)

* Resolved #26016: Updated READMEs and index.md to contain Herbert

Updated READMEs and ran make fix-copies

02c4a77f

[VITS] Fix nightly tests (#25986) · 2af87d01

Sanchit Gandhi authored Sep 07, 2023

* fix tokenizer

* make bs even

* fix multi gpu test

* style

* model forward

* fix torch import

* revert tok pin

2af87d01

Add `tgs` speed metrics (#25858) · 3744126c

CokeDong authored Sep 08, 2023



* Add tgs metrics

* bugfix and black formatting

* workaround for tokens counting

* formating and bugfix

* Fix

* Add opt-in for tgs metrics

* make style and fix error

* Fix doc

* fix docbuild

* hf-doc-build

* fix

* test

* Update src/transformers/training_args.py

renaming
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Update src/transformers/training_args.py

renaming
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Fix some symbol

* test

* Update src/transformers/trainer_utils.py

match nameing patterns
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/trainer.py

nice
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix reviews

* Fix

* Fix black

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3744126c

Fix CircleCI config (#26023) · 0188739a
Yih-Dar authored Sep 07, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
0188739a
fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 (#26024) · df04959e
Kai authored Sep 07, 2023

df04959e
Fix err with FSDP (#25991) · e3a97163
Zach Mueller authored Sep 07, 2023
```
* Fix err

* Use version check
```
e3a97163

06 Sep, 2023 7 commits

modify context length for GPTQ + version bump (#25899) · fa6107c9

Marc Sun authored Sep 06, 2023



* add new arg for gptq

* add tests

* add min version autogptq

* fix order

* skip test

* fix

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

* change model path

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fa6107c9

Remove Falcon from undocumented list (#26008) · 300d6a4a
Matt authored Sep 06, 2023
```
Remove falcon from undocumented list
```
300d6a4a

🌐

[i18n-KO] Translated `llm_tutorial.md` to Korean (#25791) · fa522d8d

Harheem Kim authored Sep 06, 2023

* docs: ko: llm_tutoroal.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* fix: resolve suggestions

fa522d8d

Fix small typo README.md (#25934) · 3e203f92

zspo authored Sep 06, 2023



* fix some samll bugs in readme

* Update docs/README.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3e203f92

TF-OPT attention mask fixes (#25238) · 842e99f1

Matt authored Sep 06, 2023



* stash commit

* More OPT updates

* Update src/transformers/models/opt/modeling_tf_opt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

842e99f1

Falcon: fix revision propagation (#26006) · f6301b9a
Lysandre Debut authored Sep 06, 2023
```
* Fix revision propagation

* Cleaner
```
f6301b9a
Update README.md (#26003) · f6295c6c
Nino Risteski authored Sep 06, 2023
```
fixed a typo
```
f6295c6c

05 Sep, 2023 8 commits

save space when converting hf model to megatron model. (#25950) · 172f42c5
tju_skywalker authored Sep 06, 2023
```
* fix convert megatron model too large

* fix convert megatron model too large
```
172f42c5

Fix Mega chunking error when using decoder-only model (#25765) · b8def689

Tanay Mehta authored Sep 06, 2023

* add: potential fix to mega chunking in decoder only model bug

* add: decoder with chunking test

* add: input_mask passed with input_ids

b8def689

[`VITS`] tokenizer integration test: fix revision did not exist (#25996) · 4fa0aff2
Arthur authored Sep 05, 2023
```
* revision did not exist

* correct revision
```
4fa0aff2

[`CI`] Fix red CI and ERROR failed should show (#25995) · d0354e5e

Arthur authored Sep 05, 2023

* start with error too

* fix ?

* start with nit

* one more path

* use `job_name`

* mark pipeline test as slow

d0354e5e

Add LLaMA resources (#25859) · 6206f599

Injin Paek authored Sep 06, 2023



* docs: feat: model resources for llama

* fix: resolve suggestion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

6206f599

[Wav2Vec2 Conformer] Fix inference float16 (#25985) · 8d518013
Sanchit Gandhi authored Sep 05, 2023
```
* [Wav2Vec2 Conformer] Fix inference float16

* fix test

* fix test more

* clean pipe test
```
8d518013

deepspeed resume from ckpt fixes and adding support for deepspeed optimizer... · 6bc517cc

Sourab Mangrulkar authored Sep 05, 2023

deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (#25863)

* Add support for deepspeed optimizer and HF scheduler

* fix bug

* fix the import

* fix issue with deepspeed scheduler saving for hf optim + hf scheduler scenario

* fix loading of hf scheduler when loading deepspeed checkpoint

* fix import of `DeepSpeedSchedulerWrapper`

* add tests

* add the comment and skip the failing tests

* address comment

6bc517cc

Add TFDebertaV2ForMultipleChoice (#25932) · 1110b565

raghavanone authored Sep 05, 2023

* Add TFDebertaV2ForMultipleChoice

* Import newer model in main init

* Fix import issues

* Fix copies

* Add doc

* Fix tests

* Fix copies

* Fix docstring

1110b565