Commits · 75bbfd5b2237b7e35a9265731ecf63022579e7e2 · chenpangpang / transformers

30 Apr, 2024 1 commit
- Cache: Static cache as a standalone object (#30476) · 75bbfd5b
  Joao Gante authored Apr 30, 2024
  
  75bbfd5b
29 Apr, 2024 1 commit

Reenable SDPA's FA2 During Training with torch.compile (#30442) · 9df8b301

Benjamin Warner authored Apr 29, 2024

* Reenable SDPA's FA2 during training with torch.compile

* fix Olmo's SDPA FA2 dispatching too

* update formatting

* improved SDPA comment

* formatting and explanatory comment

* is_causal if statement to one-liner

9df8b301

22 Apr, 2024 1 commit
- `Llama` family, fix `use_cache=False` generation (#30380) · 2d92db84
  Arthur authored Apr 22, 2024
```
* nit to make sure cache positions are not sliced

* fix other models

* nit

* style
```
  2d92db84
18 Apr, 2024 2 commits

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert... · 5728b5ad

Younes Belkada authored Apr 18, 2024

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317)

* Update awq.py

* style

* revert felix PR

* fix

* add felix comments

5728b5ad

Revert "Re-enable SDPA's FA2 path (#30070)" (#30314) · acab997b

Arthur authored Apr 18, 2024

* Revert "Re-enable SDPA's FA2 path (#30070)"

This reverts commit 05bdef16.

* Revert "Fix quality Olmo + SDPA (#30302)"

This reverts commit ec92f983.

acab997b

17 Apr, 2024 2 commits

Fix quality Olmo + SDPA (#30302) · ec92f983
fxmarty authored Apr 17, 2024
```
fix olmo
```
ec92f983

Add OLMo model family (#29890) · e4ea19b9

Shane A authored Apr 17, 2024

* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4b150681bfa559846f40e9b7b7f97d7908.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4a294ace89ab047b793ca55e623a9ceefc.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv

e4ea19b9

05 Apr, 2024 1 commit

Fix `torch.fx` symbolic tracing for LLama (#30047) · 17cd7a9d

Michael Benayoun authored Apr 05, 2024

* [WIP] fix fx

* [WIP] fix fx

* [WIP] fix fx

* [WIP] fix fx

* [WIP] fix fx

* Apply changes to other models

17cd7a9d

04 Apr, 2024 1 commit
- Refactor Cohere Model (#30027) · 517a3e67
  Saurabh Dash authored Apr 04, 2024
```
* changes

* addressing comments

* smol fix
```
  517a3e67
30 Mar, 2024 1 commit
- [`BC`] Fix BC for AWQ quant (#29965) · 6e584070
  TechxGenus authored Mar 31, 2024
```
fix awq quant
```
  6e584070
28 Mar, 2024 1 commit
- [`BC`] Fix BC for other libraries (#29934) · 2bbbf1be
  Arthur authored Mar 28, 2024
```
* fi xbc?

* nit
```
  2bbbf1be
22 Mar, 2024 1 commit
- [`cleanup`] vestiges of causal mask (#29806) · 2e7cb46f
  Arthur authored Mar 22, 2024
```
nit
```
  2e7cb46f
21 Mar, 2024 1 commit
- Llama: always convert the causal mask in the SDPA code path (#29663) · ee38fc31
  Joao Gante authored Mar 21, 2024
```
* always convert the mask

* rebase and fix copies
```
  ee38fc31
20 Mar, 2024 1 commit

[`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900

Arthur authored Mar 21, 2024

* attempt to fix

* the actual fix that works with compilation!

* this?

* temporary update

* nit?

* dispatcg to memory efficient?

* update both models that have static cache support

* fix copies fix compile

* make sure fix

* fix cohere and gemma

* fix beams?

* nit

* slipped through the cracks

* nit

* nits

* update

* fix-copies

* skip failing tests

* nits

ff841900

19 Mar, 2024 1 commit

Llama: partial 4d masks (#29731) · 4294f0c3

Joao Gante authored Mar 19, 2024



* partial 4d masks

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4294f0c3

15 Mar, 2024 1 commit

Cohere Model Release (#29622) · 0e4a1c34

Saurabh Dash authored Mar 15, 2024



* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

0e4a1c34

14 Mar, 2024 1 commit
- Generate: handle `cache_position` update in `generate` (#29467) · 23db187d
  Joao Gante authored Mar 14, 2024
  
  23db187d
13 Mar, 2024 1 commit
- Llama: allow custom 4d masks (#29618) · 1e21c4fb
  Joao Gante authored Mar 13, 2024
  
  1e21c4fb
08 Mar, 2024 1 commit

StableLM: Fix dropout argument type error (#29236) · f386c51a

liangjs authored Mar 08, 2024



* fix stablelm dropout argument type error

* fix docs of _flash_attention_forward

* fix all docs of _flash_attention_forward

* fix docs of _flash_attention_forward in starcoder2

---------
Co-authored-by: oliang <oliang@tencent.com>

f386c51a

06 Mar, 2024 2 commits

Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS device (#29439) · d45f47ab

Park Jun authored Mar 07, 2024



* Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS devices

* Update src/transformers/models/gemma/modeling_gemma.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update llama ang gemma rope use cpu in mps device

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d45f47ab

Substantially reduce memory usage in _update_causal_mask for large batches by... · 2a939f20

Glen Taggart authored Mar 06, 2024


Substantially reduce memory usage in _update_causal_mask for large batches by using .expand instead of .repeat [needs tests+sanity check] (#29413)

* try to fix gemma mem use

* fix: handle attention mask dim==2 case

* remove logits=logits.float()

* clean up + add llama

* apply formatting

* readability edit: swap order of items being multiplied

* revert change unrelated to PR

* revert black autoformat

* switch to one .to

* Accept style edits
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2a939f20

01 Mar, 2024 1 commit
- [`Llama + AWQ`] fix `prepare_inputs_for_generation` 🫠 (#29381) · e7b98370
  Arthur authored Mar 01, 2024
```
* use the generation config 🫠

* fixup
```
  e7b98370
28 Feb, 2024 3 commits

Better SDPA unmasking implementation (#29318) · 49204c1d

fxmarty authored Feb 28, 2024

* better unmask imple

* comment

* typo

* bug report pytorch

* cleanup

* fix import

* add back example

* retrigger ci

* come on

49204c1d

check if position_ids exists before using it (#29306) · 554e7ada
jiqing-feng authored Feb 28, 2024
```
Co-authored-by: Joao Gante <joao@huggingface.co>
```
554e7ada

RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285) · d3a4b475

Daniel Han authored Feb 29, 2024



* Update modeling_llama.py

Llama - Force float32 since bfloat16 loses precision on long contexts

* Update modeling_llama.py

* Update modeling_gemma.py

Fix RoPE and logits.float()

* @torch.no_grad()

* @torch.no_grad()

* Cos, Sin to float32

* cos, sin to float32

* Update src/transformers/models/gemma/modeling_gemma.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Resolve PR conflicts

* Fix RoPE for llama

* Revert "Fix RoPE for llama"

This reverts commit b860a22dab9bb01cd15cb9a3220abeaefad3e458.

* Fix RoPE for llama

* RoPE device

* Autocast device type

* RoPE

* RoPE isinstance

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d3a4b475

26 Feb, 2024 1 commit
- Use `torch.bool` instead of `torch.int64` for non-persistant causal mask buffer (#29241) · 24d59c79
  fxmarty authored Feb 26, 2024
```
use torch.bool instead of torch.int64
```
  24d59c79
23 Feb, 2024 1 commit
- Improve _update_causal_mask performance (#29210) · 3f60d11a
  Alessandro Palla authored Feb 23, 2024
```
* Fix issue 29206

* Fix style
```
  3f60d11a
22 Feb, 2024 2 commits
- Fix `torch.compile` with `fullgraph=True` when `attention_mask` input is used (#29211) · 2cc8cf6c
  fxmarty authored Feb 22, 2024
```
* fix torch.export.export for llama

* do not change doc title

* make fix copies
```
  2cc8cf6c
- [Gemma] Fix eager attention (#29187) · 2a9b1f80
  Sanchit Gandhi authored Feb 22, 2024
```
* fix modelling code

* add tests

* fix tests

* add some logit tests

* style

* fix fix
```
  2a9b1f80
21 Feb, 2024 4 commits

FIX [`Gemma`] Fix bad rebase with transformers main (#29170) · ae49b218
Younes Belkada authored Feb 21, 2024
```
fix bad rebase
```
ae49b218

[ `gemma`] Adds support for Gemma

💎

(#29167) · 594c1277

Arthur authored Feb 21, 2024

* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗



* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀



* update all readmes

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>

594c1277

`torch.compile` compatibility with `generate` + static cache (#29114) · cc4a664b

fxmarty authored Feb 21, 2024



* fix compatibility

* working version

* cleanup

* sanity checks

* more sanity

* working version WITH refactor

* working without API change

* cleanup & tests pass

* more cleaning

* fix test

* fix tests

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* smaller comment

* update comment

* update comment

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

cc4a664b

🚨 Llama: update rope scaling to match static cache changes (#29143) · 3994fa5b
Joao Gante authored Feb 21, 2024

3994fa5b

20 Feb, 2024 1 commit
- Llama: fix batched generation (#29109) · 7d312ad2
  Joao Gante authored Feb 20, 2024
  
  7d312ad2
15 Feb, 2024 1 commit

Fix static generation when compiling! (#28937) · f3788b09

Arthur authored Feb 15, 2024



* wow I was scared!

* fix everything

* nits

* make it BC?

* add todo

* nits

* is_tracing should still be used to pass tracing tests

* nits

* some nits to make sure genration works with static cache uncompiled

* fix sdpa

* fix FA2 for both static and dynamic in a better way?

* style

* fix-copies

* fix fix copies

* fix sequential beam searcg

* style

* use `keys_to_ignore`

* nit

* correct dtype inference when init

* :( the fix for FA2 is still not optimal to investigate!

* styling

* nits

* nit

* this might work better

* add comment

* Update src/transformers/models/llama/modeling_llama.py

* "position_ids" -> "cache_position"

* style

* nit

* Remove changes that should no be propagatted just yet

* Apply suggestions from code review

* Styling

* make sure we raise an errir for static cache with FA2 enabled

* move  to the bottom of the signature

* style

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

* nit in the name

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

f3788b09

08 Feb, 2024 2 commits

fix: torch.int32 instead of torch.torch.int32 (#28883) · 0b693e90
vodkaslime authored Feb 08, 2024

0b693e90

[`Core generation`] Adds support for static KV cache (#27931) · 115ac94d

Arthur authored Feb 08, 2024

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

115ac94d

06 Feb, 2024 1 commit

Adds LlamaForQuestionAnswering class in modeling_llama.py along with AutoModel Support (#28777) · 2e7c942c

nakranivaibhav authored Feb 06, 2024

* This is a test commit

* testing commit

* final commit with some changes

* Removed copy statement

* Fixed formatting issues

* Fixed error added past_key_values in the forward method

* Fixed a trailing whitespace. Damn the formatting rules are strict

* Added the copy statement

2e7c942c

31 Jan, 2024 1 commit
- DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
  Joao Gante authored Jan 31, 2024
```
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
```
  beb2a096
15 Jan, 2024 1 commit
- [`chore`] Update warning text, a word was missing (#28017) · 881e966a
  Tom Aarsen authored Jan 15, 2024
```
Update warning, a word was missing
```
  881e966a