Commits · e391706420934f6c87cebe9997fc85a757aa4353 · chenpangpang / transformers

16 Jul, 2024 1 commit

Speedup model init on CPU (by 10x+ for llama-3-8B as one example) (#31771) · e0dfd7bc

Zach Mueller authored Jul 16, 2024



* 1,100%!

* Clean

* Don't touch DS

* Experiment with dtype allocation

* skip test_load_save_without_tied_weights test

* A little faster

* Include proper upscaling?

* Fixup tests

* Potentially skip?

* Let's see if this fixes git history

* Maintain new dtype

* Fin

* Rm hook idea for now

* New approach, see what breaks

* stage

* Clean

* Stash

* Should be fin now, just need to mark failing models

* Clean up

* Simplify

* Deal with weird models

* Enc/Dec

* Skip w/ reason

* Adjust test

* Fix test

* one more test

* Keep experimenting

* Fix ref

* TO REMOVE: testing feedback CI

* Right push

* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* disable

* Add new func

* Test nits from Amy

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Adjust comment

* Adjust comment on skip

* make private

* Fin

* Should be a not flag

* Clarify and rename test

---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

e0dfd7bc

11 Jul, 2024 1 commit

Add warning message for beta and gamma parameters (#31654) · 1499a550

Omar Salman authored Jul 11, 2024

* Add warning message for  and  parameters

* Fix when the warning is raised

* Formatting changes

* Improve testing and remove duplicated warning from _fix_key

1499a550

09 Jul, 2024 3 commits

Add return type annotation to PreTrainedModel.from_pretrained (#31869) · c5bc2d5f
Mauricio Villegas authored Jul 09, 2024
```
Update modeling_utils.py

Add return type annotation to PreTrainedModel.from_pretrained
```
c5bc2d5f
save_pretrained: use tqdm when saving checkpoint shards from offloaded params (#31856) · cffa2b9c
kallewoof authored Jul 09, 2024

cffa2b9c

Deprecate `vocab_size` in other two VLMs (#31681) · 952dfd48

Raushan Turganbay authored Jul 09, 2024



* deprrecate `vocab_size` in other two VLMs

* Update src/transformers/models/fuyu/configuration_fuyu.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* depracate until 4.44

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

952dfd48

05 Jul, 2024 1 commit
- Fix serialization for offloaded model (#31727) · 8c5c180d
  Marc Sun authored Jul 05, 2024
```
* Fix serialization

* style

* add test
```
  8c5c180d
27 Jun, 2024 1 commit
- [QoL] Allow dtype str for torch_dtype arg of from_pretrained (#31590) · 3a028101
  Billy Cao authored Jun 27, 2024
```
* Allow dtype str for torch_dtype in from_pretrained

* Update docstring

* Add tests for str torch_dtype
```
  3a028101
19 Jun, 2024 1 commit
- Mamba: add generative tests (#31478) · 83259e40
  Joao Gante authored Jun 19, 2024
  
  83259e40
12 Jun, 2024 2 commits
- Use huggingface_hub helper function to split state dict (#31091) · 254b25ab
  Marc Sun authored Jun 12, 2024
```
* shard saving from hf hub

* index = None

* fix tests

* indent
```
  254b25ab
- Update comment in modeling_utils.py (#31299) · 1c73d85b
  Aaron V authored Jun 12, 2024
  
  1c73d85b
07 Jun, 2024 1 commit

Extend save_pretrained to offloaded models (#27412) · ff689f57

Benjamin Badger authored Jun 07, 2024



* added hidden subset

* debugged hidden subset contrastive search

* added contrastive search compression

* debugged compressed contrastive search

* memory reduction for contrastive search

* debugged mem red

* added low memory option feature

* debugged mem optmimization output stack

* debugged mem optmimization output stack

* debugged low mem

* added low mem cache

* fixed 2047 tensor view

* debugged 2042 past key val inputs

* reformatted tensors

* changed low mem output

* final clean

* removed subset hidden csearch

* fixed hidden device

* fixed hidden device

* changed compressor dtype

* removed hstate compression

* integrated csearch in generate

* test csearch integration into generation

exit()

* fixed csearch kwarg integration with generation

* final wrap and added doc

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* added debug print

* direct hstate cat

* direct hstate cat

* direct hstate cat debug

* direct hstate cat debug

* expanded full hidden state stack

* expanded full hidden state stack

* matched dims for hstates

* matched dims for hstates

* logits fix

* equality test

* equality hidden debug

* debug

* added prints for debug

* added prints for debug

* equality check

* switched squeeze dim

* input format debug

* tracing top_k_ids

* removed trace

* added test context

* added jitter

* added jitter

* added jitter

* returned state

* rebuilt past key value reconstruction

* debugged

* cleaned traces

* added selection for pkv

* changed output to dict

* cleaned

* cleaned

* cleaned up contrastive search test

* moved low_memory kwarg

* debugged

* changed low mem test batch size to 1

* removed output

* debugged test input shape

* reformatted csearch test

* added trace

* removed unsqueeze on final forward pass

* replaced unsqueeze with view

* removed traces

* cleaned

* debugged model kwargs

* removed special models from test

* ran make quality

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* refactored

* refactored

* refactored

* make fixup

* renamed flag sequential

* renamed flag sequential

* iterative onloading

* black style and test utils

* added traces for integrated test

* debugged

* added traces

* make style

* removed traces, make style

* included suggestions and added test

* debugged test

* added offload module check and make style

* is_accelerate_available and make style

* added test decorator

* changed test model and config spec

* added offload condition

* added lazy loading for each shard

* debugged

* modified sharding

* debugged

* added traces

* removed safe serialization

* no index overload;

* trace on safe save ptrs

* added ptr condition

* debugged

* debugged ptr

* moved module map init

* remake shard only for offloaded modules

* refactored

* debugged

* refactored

* debugged

* cleaned and make style

* cleaned and make style

* added trace

* sparse module map

* debugged

* removed module map conditional

* refactored

* debug

* debugged

* added traces

* added shard mem trace

* added shard mem trace

* removed underlying storage check

* refactored

* memory leak removal and make style

* cleaned

* swapped test decs and make style

* added mem checks and make style

* added free mem warning

* implemented some suggestions

* moved onloading to accelerate

* refactored for accelerate integration

* cleaned test

* make style

* debugged offload map name

* cleaned and make style

* replaced meta device check for sharding

* cleaned and make style

* implemented some suggestions

* more suggestions

* update warning
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* more suggestions

* make style

* new make style

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ff689f57

29 May, 2024 1 commit

Use `HF_HUB_OFFLINE` + fix has_file in offline mode (#31016) · c3044ec2

Lucain authored May 29, 2024

* Fix has_file in offline mode

* harmonize env variable for offline mode

* Switch to HF_HUB_OFFLINE

* fix test

* revert test_offline to test TRANSFORMERS_OFFLINE

* Add new offline test

* merge conflicts

* docs

c3044ec2

28 May, 2024 2 commits

FIX: Add `accelerate` as a hard requirement (#31090) · 94d416f0
Younes Belkada authored May 28, 2024
```
add accelerate
```
94d416f0

fix from_pretrained in offline mode when model is preloaded in cache (#31010) · 936ab7ba

oOraph authored May 28, 2024



* Unit test to verify fix
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

* fix from_pretrained in offline mode when model is preloaded in cache
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

* minor: fmt
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

---------
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>

936ab7ba

24 May, 2024 1 commit
- Do not trigger autoconversion if local_files_only (#31004) · 03935d30
  Lucain authored May 24, 2024
  
  03935d30
23 May, 2024 1 commit

Quantized KV Cache (#30483) · d583f131

Raushan Turganbay authored May 23, 2024



* clean-up

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* more suggestions

* mapping if torch available

* run tests & add 'support_quantized' flag

* fix jamba test

* revert, will be fixed by another PR

* codestyle

* HQQ and versatile cache classes

* final update

* typo

* make tests happy

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

d583f131

16 May, 2024 2 commits
- Disable the FA backend for SDPA on AMD GPUs (#30850) · 0753134f
  Mohit Sharma authored May 16, 2024
```
* disable fa

* disable fa

* update warning

* update warning
```
  0753134f
- Cache: add new flag to distinguish models that `Cache` but not static cache (#30800) · 9d889f87
  Joao Gante authored May 16, 2024
```
* jamba cache

* new flag

* generate exception
```
  9d889f87
15 May, 2024 2 commits

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models (#30806) · 3f435823

Younes Belkada authored May 15, 2024



* add  method

* change method name

* more comments

* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixup

* add docstrings and fix comment

* warn users on the de-quantized dtype

* Update src/transformers/quantizers/base.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* final suggestion - use private method

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3f435823

Loading GGUF files support (#30391) · a4284495

Lysandre Debut authored May 15, 2024



* Adds support for loading GGUF files
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>

* add q2_k q3_k q5_k support from @99991

* fix tests

* Update doc

* Style

* Docs

* fix CI

* Update docs/source/en/gguf.md

* Update docs/source/en/gguf.md

* Compute merges

* change logic

* add comment for clarity

* add comment for clarity

* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change logic

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_gguf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* put back comment

* add comment about mistral

* comments and added tests

* fix unconsistent type

* more

* fix tokenizer

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address comments about tests and tokenizer + add added_tokens

* from_gguf -> gguf_file

* replace on docs too

---------
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

a4284495

07 May, 2024 1 commit
- Add safetensors to model not found error msg for default use_safetensors value (#30602) · cf7bed98
  David Xue authored May 07, 2024
```
* add safetensors to model not found error for default use_safetensors=None case

* format code w/ ruff

* fix assert true typo
```
  cf7bed98
06 May, 2024 1 commit

Respect `resume_download` deprecation (#30620) · 835de4c8

Lucain authored May 06, 2024



* Deprecate resume_download

* remove default resume_download value

---------
Co-authored-by: Lysandre Debut <hi@lysand.re>

835de4c8

02 May, 2024 2 commits

Add HQQ quantization support (#29637) · 59952994

mobicham authored May 02, 2024



* update HQQ transformers integration

* push import_utils.py

* add force_hooks check in modeling_utils.py

* fix | with Optional

* force bias as param

* check bias is Tensor

* force forward for multi-gpu

* review fixes pass

* remove torch grad()

* if any key in linear_tags fix

* add cpu/disk check

* isinstance return

* add multigpu test + refactor tests

* clean hqq_utils imports in hqq.py

* clean hqq_utils imports in quantizer_hqq.py

* delete hqq_utils.py

* Delete src/transformers/utils/hqq_utils.py

* ruff init

* remove torch.float16 from __init__ in test

* refactor test

* isinstance -> type in quantizer_hqq.py

* cpu/disk device_map check in quantizer_hqq.py

* remove type(module) nn.linear check in quantizer_hqq.py

* add BaseQuantizeConfig import inside HqqConfig init

* remove hqq import in hqq.py

* remove accelerate import from test_hqq.py

* quant config.py doc update

* add hqqconfig to main_classes doc

* make style

* __init__ fix

* ruff __init__

* skip_modules list

* hqqconfig format fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* test_hqq.py remove mistral comment

* remove self.using_multi_gpu is False

* torch_dtype default val set and logger.info

* hqq.py isinstance fix

* remove torch=None

* torch_device test_hqq

* rename test_hqq

* MODEL_ID in test_hqq

* quantizer_hqq setattr fix

* quantizer_hqq typo fix

* imports quantizer_hqq.py

* isinstance quantizer_hqq

* hqq_layer.bias reformat quantizer_hqq

* Step 2 as comment in quantizer_hqq

* prepare_for_hqq_linear() comment

* keep_in_fp32_modules fix

* HqqHfQuantizer reformat

* quantization.md hqqconfig

* quantization.md model example reformat

* quantization.md # space

* quantization.md space   })

* quantization.md space   })

* quantization_config fix doc
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* axis value check in quantization_config

* format

* dynamic config explanation

* quant config method in quantization.md

* remove shard-level progress

* .cuda fix modeling_utils

* test_hqq fixes

* make fix-copies

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

59952994

Fix: failing CI after #30568 (#30599) · 5cf3e6bf
Raushan Turganbay authored May 02, 2024
```
* failiing CI

* no let's keep it intil full deprecation in  v4.42
```
5cf3e6bf

26 Apr, 2024 1 commit
- Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types (#30488) · 20081c74
  Michael Goin authored Apr 26, 2024
```
* Update modeling_utils/dtype_byte_size to handle float8 types

* Add a test for dtype_byte_size

* Format

* Fix bool
```
  20081c74
25 Apr, 2024 1 commit
- Quantization: `HfQuantizer` quant method update (#30484) · 26ddc580
  Younes Belkada authored Apr 25, 2024
```
ensure popular quant methods are supported
```
  26ddc580
23 Apr, 2024 1 commit

fix for itemsize => element_size() for torch backwards compat (#30133) · 57fc00f3

Wing Lian authored Apr 23, 2024



* fix for itemsize => element_size() for torch backwards compat

* improve handling of element counting

* Update src/transformers/modeling_utils.py

* fixup

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

57fc00f3

19 Apr, 2024 2 commits

Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained (#30299) · 21c912e7
hoshi-hiyouga authored Apr 20, 2024
```
* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py
```
21c912e7

Update unwrap from accelerate (#29933) · b4fd49b6

Marc Sun authored Apr 19, 2024



* Use unwrap with the one in accelerate

* oups

* update unwrap

* fix

* wording

* raise error instead

* comment

* doc

* Update src/transformers/modeling_utils.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* style

* put else

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

b4fd49b6

12 Apr, 2024 1 commit
- fix: Replaced deprecated `logger.warn` with `logger.warning` (#30197) · caa5c65d
  Sai-Suraj-27 authored Apr 12, 2024
```
* Fixed deprecated logger.warn by using logger.warning

* Reformatted using ruff.
```
  caa5c65d
10 Apr, 2024 1 commit
- FIX / bnb: fix torch compatiblity issue with `itemize` (#30162) · f569172f
  Younes Belkada authored Apr 10, 2024
```
* fix torch compatiblity issues

* fix

* Update src/transformers/modeling_utils.py
```
  f569172f
09 Apr, 2024 1 commit

Fix failing DeepSpeed model zoo tests (#30112) · 4e3490f7

Sourab Mangrulkar authored Apr 09, 2024

* fix sequence length errors

* fix label column name error for vit

* fix the lm_head embedding!=linear layer mismatches for Seq2Seq models

4e3490f7

02 Apr, 2024 1 commit

Hard error when ignoring tensors. (#27484) (#29906) · 9b0a8ea7

Nicolas Patry authored Apr 02, 2024



* Hard error when ignoring tensors. (#27484)

* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add small tests.

* Dead variable.

* Fixup.

* Fixing tied_Weights_keys on generic models.

* Fixup + T5 encoder/decoder tying (with different layers)

* Code quality.

* Dynamic member.

* trigger

* Fixing encoder name for other types of encoder/decoder combos.

* Fix scoping.

* Update .github/workflows/self-scheduled.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fixing the tied_weights after the call.

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9b0a8ea7

27 Mar, 2024 1 commit

Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7

Lysandre Debut authored Mar 27, 2024

* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors

4d8427f7

25 Mar, 2024 2 commits
- [`revert commit`] revert 00a09ed4 · e3e16ddc
  Arthur Zucker authored Mar 25, 2024
  
  e3e16ddc
- fix 😭 · 00a09ed4
  Arthur Zucker authored Mar 25, 2024
  
  00a09ed4
18 Mar, 2024 1 commit

FIX [`bnb`] Make `unexpected_keys` optional (#29420) · c852d4fb

Younes Belkada authored Mar 18, 2024



* make `unexpected_keys` optional

* push

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

c852d4fb

15 Mar, 2024 1 commit

[Quantization] Quanto quantizer (#29023) · 28de2f4d

Marc Sun authored Mar 15, 2024



* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>

28de2f4d

13 Mar, 2024 2 commits

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587) · 350c5d15

Sourab Mangrulkar authored Mar 13, 2024



* fsdp+qlora related changes

* fixes

* Update quantization_config.py

* support fsdp+qlora and dsz3+qlora

* Update quantization_config.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* handle fsdp+qlora and dsz3+qlora correctly while model loading

* fix param count

* quality

* fsdp related changes

* fsdp changes only when using LoRA/QLoRA

* add accelerate version check

* refactor, update min accelerate version and add tests

1. Update minimum accelerate version to 0.26.0
2. Clean the trainer wrt accelerate version checks
3. FSDP refactor and test for fsdp config
4. use `itemsize` instead of `dtype2bytes` dict

* fix test

* Address comments
Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fix the conditional flag

* fix conditional flag

* address comments
Co-Authored-By: Zach Mueller <7831895+muellerzr@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Zach Mueller <7831895+muellerzr@users.noreply.github.com>

350c5d15

[PyTorch/XLA] Fix extra TPU compilations introduced by recent changes (#29158) · b340d907
Jiewen Tan authored Mar 13, 2024
```
* tmp

* Remove debug step

* Fix a typo

* Move to is_torch_xla_available
```
b340d907