Commits · 2418c64a1cfe317bb8d238d3670693799276d16d · chenpangpang / transformers

02 Feb, 2024 1 commit
- [docs] HfQuantizer (#28820) · 2418c64a
  Steven Liu authored Feb 01, 2024
```
* tidy

* fix path
```
  2418c64a
28 Nov, 2023 1 commit
- [docs] Quantization (#27641) · bd50402b
  Steven Liu authored Nov 28, 2023
```
* first draft

* benchmarks

* feedback
```
  bd50402b
06 Nov, 2023 1 commit
- [docs] fixed links with 404 (#27327) · 9beb2737
  Maria Khalusova authored Nov 06, 2023
```
* fixed links with 404

* make style
```
  9beb2737
01 Nov, 2023 2 commits

Add exllamav2 better (#27111) · c9e72f55

Marc Sun authored Nov 01, 2023



* add_ xllamav2 arg

* add test

* style

* add check

* add doc

* replace by use_exllama_v2

* fix tests

* fix doc

* style

* better condition

* fix logic

* add deprecate msg

* deprecate exllama

* remove disable_exllama from the linter

* remove

* fix warning

* Revert the commits deprecating exllama

* deprecate disable_exllama for use_exllama

* fix

* fix loading attribute

* better handling of args

* remove disable_exllama from init and linter

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* better arg

* fix warning

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* switch to dict

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* style

* nits

* style

* better tests

* style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

c9e72f55

[`core` / `Quantization` ] AWQ integration (#27045) · ae093eef

Younes Belkada authored Nov 01, 2023



* working v1

* oops

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fixup

* oops

* push

* more changes

* add docs

* some fixes

* fix copies

* add v1 doc

* added installation guide

* relax constraints

* revert

* attempt llm-awq

* oops

* oops

* fixup

* raise error when incorrect cuda compute capability

* nit

* add instructions for llm-awq

* fixup

* fix copies

* fixup and docs

* change

* few changes + add demo

* add v1 tests

* add autoawq in dockerfile

* finalize

* Update tests/quantization/autoawq/test_awq.py

* fix test

* fix

* fix issue

* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add link to example script

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add more content

* add more details

* add link to quantization docs

* camel case + change backend class name

* change to string

* fixup

* raise errors if libs not installed

* change to `bits` and `group_size`

* nit

* nit

* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* disable training

* address some comments and fix nits

* fix

* final nits and fix tests

* adapt to our new runners

* make fix-copies

* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* move to top

* add conversion test

* final nit

* add more elaborated test

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ae093eef

31 Oct, 2023 1 commit

Add support for loading GPTQ models on CPU (#26719) · 2963e196

Vivek Khandelwal authored Oct 31, 2023

* Add support for loading GPTQ models on CPU

Right now, we can only load the GPTQ Quantized model on the CUDA
device. The attribute `gptq_supports_cpu` checks if the current
auto_gptq version is the one which has the cpu support for the
model or not.
The larger variants of the model are hard to load/run/trace on
the GPU and that's the rationale behind adding this attribute.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

* Update quantization.md

* Update quantization.md

* Update quantization.md

2963e196

30 Oct, 2023 1 commit

Translating `en/main_classes` folder docs to Japanese

🇯🇵

(#26894) · 84724efd

Rockerz authored Oct 30, 2023



* add

* add

* add

* Add deepspeed.md

* Add

* add

* Update docs/source/ja/main_classes/callback.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/output.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/pipelines.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/text_generation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update  logging.md

* Update toctree.yml

* Update docs/source/ja/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add suggesitons

* m

* Update docs/source/ja/main_classes/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update toctree.yml

* Update Quantization.md

* Update docs/source/ja/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update toctree.yml

* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

84724efd

27 Oct, 2023 1 commit
- Revert "add exllamav2 arg" (#27102) · 90ee9cea
  Arthur authored Oct 27, 2023
```
Revert "add exllamav2 arg (#26437)"

This reverts commit 8214d6e7.
```
  90ee9cea
26 Oct, 2023 1 commit

add exllamav2 arg (#26437) · 8214d6e7

Marc Sun authored Oct 26, 2023

* add_ xllamav2 arg

* add test

* style

* add check

* add doc

* replace by use_exllama_v2

* fix tests

* fix doc

* style

* better condition

* fix logic

* add deprecate msg

8214d6e7

12 Oct, 2023 1 commit
- chore: fix typos (#26756) · 883ed4b3
  Heinz-Alexander Fuetterer authored Oct 12, 2023
  
  883ed4b3
14 Aug, 2023 1 commit

fix gptq nits (#25500) · 06a1d75b

Marc Sun authored Aug 14, 2023

* fix nits

* fix docstring

* fix doc

* fix damp_percent

* fix doc

06a1d75b

10 Aug, 2023 1 commit

GPTQ integration (#25062) · 55db70c6

Marc Sun authored Aug 10, 2023



* GTPQ integration

* Add tests for gptq

* support for more quantization model

* fix style

* typo

* fix method

* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add dataclass and fix quantization_method

* fix doc

* Update tests/quantization/gptq/test_gptq.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* modify dataclass

* add gtpqconfig import

* fix typo

* fix tests

* remove dataset as req arg

* remove tokenizer import

* add offload cpu quantization test

* fix check dataset

* modify dockerfile

* protect trainer

* style

* test for config

* add more log

* overwrite torch_dtype

* draft doc

* modify quantization_config docstring

* fix class name in docstring

* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* more warning

* fix 8bit kwargs tests

* peft compatibility

* remove var

* fix is_gptq_quantized

* remove is_gptq_quantized

* fix wrap

* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add exllama

* skip test

* overwrite float16

* style

* fix skip test

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix docsting formatting

* add doc

* better test

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

55db70c6

01 Aug, 2023 1 commit

[`Docs`/`quantization`] Clearer explanation on how things works under the... · 972fdcc7

Younes Belkada authored Aug 01, 2023


[`Docs`/`quantization`] Clearer explanation on how things works under the hood. + remove outdated info (#25216)

* clearer explanation on how things works under the hood.

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `load_in_4bit` in `from_pretrained`

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

972fdcc7

31 Jul, 2023 1 commit
- [quantization.md] fix (#25190) · 52206066
  Stas Bekman authored Jul 31, 2023
```
Update quantization.md
```
  52206066
18 Jul, 2023 1 commit

[`Docs`] Clarify 4bit docs (#24878) · ca974aff

Younes Belkada authored Jul 18, 2023



* clarify 4bit docs

* Apply suggestions from code review
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

ca974aff

10 Jul, 2023 1 commit
- add link to accelerate doc (#24601) · 35eac0df
  Marc Sun authored Jul 10, 2023
  
  35eac0df
20 Jun, 2023 1 commit

Migrate doc files to Markdown. (#24376) · eb849f66

Sylvain Gugger authored Jun 20, 2023



* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

eb849f66

01 Jun, 2023 1 commit

Modify device_map behavior when loading a model using from_pretrained (#23922) · e03a9cc0

Marc Sun authored Jun 01, 2023



* Modify device map behavior for 4/8 bits model

* Remove device_map arg for training 4/8 bit model

* Remove index
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add Exceptions

* Modify comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix formatting

* Get current device with accelerate

* Revert "Get current device with accelerate"

This reverts commit 46f00799103bbe15bd58762ba029aab35363c4f7.

* Fix Exception

* Modify quantization doc

* Fix error
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e03a9cc0

24 May, 2023 1 commit

4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#23479) · 9d73b922

Tim Dettmers authored May 24, 2023



* Added lion and paged optimizers and made original tests pass.

* Added tests for paged and lion optimizers.

* Added and fixed optimizer tests.

* Style and quality checks.

* Initial draft. Some tests fail.

* Fixed dtype bug.

* Fixed bug caused by torch_dtype='auto'.

* All test green for 8-bit and 4-bit layers.

* Added fix for fp32 layer norms and bf16 compute in LLaMA.

* Initial draft. Some tests fail.

* Fixed dtype bug.

* Fixed bug caused by torch_dtype='auto'.

* All test green for 8-bit and 4-bit layers.

* Added lion and paged optimizers and made original tests pass.

* Added tests for paged and lion optimizers.

* Added and fixed optimizer tests.

* Style and quality checks.

* Fixing issues for PR #23479.

* Added fix for fp32 layer norms and bf16 compute in LLaMA.

* Reverted variable name change.

* Initial draft. Some tests fail.

* Fixed dtype bug.

* Fixed bug caused by torch_dtype='auto'.

* All test green for 8-bit and 4-bit layers.

* Added lion and paged optimizers and made original tests pass.

* Added tests for paged and lion optimizers.

* Added and fixed optimizer tests.

* Style and quality checks.

* Added missing tests.

* Fixup changes.

* Added fixup changes.

* Missed some variables to rename.

* revert trainer tests

* revert test trainer

* another revert

* fix tests and safety checkers

* protect import

* simplify a bit

* Update src/transformers/trainer.py

* few fixes

* add warning

* replace with `load_in_kbit = load_in_4bit or load_in_8bit`

* fix test

* fix tests

* this time fix tests

* safety checker

* add docs

* revert torch_dtype

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* multiple fixes

* update docs

* version checks and multiple fixes

* replace `is_loaded_in_kbit`

* replace `load_in_kbit`

* change methods names

* better checks

* oops

* oops

* address final comments

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9d73b922

12 Apr, 2023 1 commit

[`bnb`] Let's make serialization of int8 models possible (#22177) · 370f0ca1

Younes Belkada authored Apr 12, 2023



* make serialization of int8 models possible

* make fixup

* add docs

* add ability to push to hub and save pretrained

* fixes

* more addition

* more tests

* fix issues

* change variable

* clearer message

* adapt from suggestions

* few fixes

* remove unused function

* Update src/transformers/utils/quantization_config.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address last comments

* last warning

* clarify doc

* protect import

* Update src/transformers/modeling_utils.py

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

370f0ca1

10 Apr, 2023 1 commit
- Fix quantization docs typo (#22666) · 14fc1a24
  Kirill authored Apr 10, 2023
  
  14fc1a24
17 Feb, 2023 1 commit

[`bnb`] Introducing `BitsAndBytesConfig` (#21579) · 3668ec17

Younes Belkada authored Feb 17, 2023



* v1 `BitsandbytesConfig`

- add v1
- add tests
- more user-friendly API
- add docs

* change to `BitsAndBytesConfig`

* replace logic

* changes

* make fixup

* quality

* make fixup

* fix doc

* fix test

* update toctree

* fix slow test

* add tips

* add warning

* change title

* oops

* Update docs/source/en/main_classes/quantization.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove unused file

* adapt suggestion

- add also tests
- change logic

* update docs

* adapt suggestions

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

3668ec17