- 05 Feb, 2024 6 commits
-
-
Arthur authored
update guidelines
-
Nicolas Patry authored
* [WIP] Hard error when ignoring tensors. * Better selection/error when saving a checkpoint. - Find all names we should normally drop (those are in the transformers config) - Find all disjoint tensors (for those we can safely trigger a copy to get rid of the sharing before saving) - Clone those disjoint tensors getting rid of the issue - Find all identical names (those should be declared in the config but we try to find them all anyway.) - For all identical names: - If they are in the config, just ignore them everything is fine - If they are not, warn about them. - For all remainder tensors which are shared yet neither identical NOR disjoint. raise a hard error. * Adding a failing test on `main` that passes here. * We don't need to keep the subfolder logic in this test. * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
w4ffl35 authored
* Add clean_code_for_run function * Call clean_code_for_run from agent method
-
Zizhao Chen authored
Fix bad doc: replace save with logging
-
Ziyang authored
Reuse trainer.create_scheduler to create scheduler for deepspeed
-
dependabot[bot] authored
Bump dash in /examples/research_projects/decision_transformer Bumps [dash](https://github.com/plotly/dash) from 2.3.0 to 2.15.0. - [Release notes](https://github.com/plotly/dash/releases) - [Changelog](https://github.com/plotly/dash/blob/dev/CHANGELOG.md) - [Commits](https://github.com/plotly/dash/compare/v2.3.0...v2.15.0 ) --- updated-dependencies: - dependency-name: dash dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 02 Feb, 2024 9 commits
-
-
amyeroberts authored
Mark test as flaky
-
Sourab Mangrulkar authored
support FSDP+PEFT
-
Yih-Dar authored
use -v in pytest Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
* fix / skip some tests before we can switch to torch 2.2 * style --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
try Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Juri Ganitkevitch authored
* Add missing None check for hf_quantizer * Add test, fix logic. * make style * Switch test model to Mistral * Comment * Update tests/test_modeling_utils.py --------- Co-authored-by:Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-
skumar951 authored
Add an explicit none-check, since token ids can be 0
-
Klaus Hipp authored
* Fix typos and grammar mistakes in docs and examples * Fix typos in docstrings and comments * Fix spelling of `tokenizer` in model tests * Remove erroneous spaces in decorators * Remove extra spaces in Markdown link texts
-
Steven Liu authored
* tidy * fix path
-
- 01 Feb, 2024 8 commits
-
-
Steven Liu authored
* backbones * fix path * fix paths * fix code snippet * fix links
-
Rockerz authored
* Add modelss * Add 2 more models * add models to tocrree * Add modles * Update docs/source/ja/model_doc/detr.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/deit.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/deplot.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix bugs --------- Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com>
-
zspo authored
Co-authored-by:p_spozzhang <p_spozzhang@tencent.com>
-
Sangbum Daniel Choi authored
* enable graident checkpointing in DetaObjectDetection * fix missing part in original DETA * make style * make fix-copies * Revert "make fix-copies" This reverts commit 4041c86c29248f1673e8173b677c20b5a4511358. * remove fix-copies of DetaDecoder * enable swin gradient checkpointing * fix gradient checkpointing in donut_swin * add tests for deta/swin/donut * Revert "fix gradient checkpointing in donut_swin" This reverts commit 1cf345e34d3cc0e09eb800d9895805b1dd9b474d. * change supports_gradient_checkpointing pipeline to PreTrainedModel * Revert "add tests for deta/swin/donut" This reverts commit 6056ffbb1eddc3cb3a99e4ebb231ae3edf295f5b. * Revert "Revert "fix gradient checkpointing in donut_swin"" This reverts commit 24e25d0a14891241de58a0d86f817d0b5d2a341f. * Simple revert * enable deformable detr gradient checkpointing * add gradient in encoder
-
Matt authored
* Add tip on setting tokenizer attributes * Grammar * Remove the bit that was causing doc builds to fail
-
fxmarty authored
* fix symbolic_trace with kv cache * comment & better test
-
Yih-Dar authored
fix Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
JB (Don) authored
* Adding [T5/MT5/UMT5]ForTokenClassification * Add auto mappings for T5ForTokenClassification and variants * Adding ForTokenClassification to the list of models * Adding attention_mask param to the T5ForTokenClassification test * Remove outdated comment in test * Adding EncoderOnly and Token Classification tests for MT5 and UMT5 * Fix typo in umt5 string * Add tests for all the existing MT5 models * Fix wrong comment in dependency_versions_table * Reverting change to common test for _keys_to_ignore_on_load_missing The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing. * Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model * Add fix-copies to MT5ModelTest
-
- 31 Jan, 2024 13 commits
-
-
Shichao Song authored
[docs] Correct the statement in the docstirng of compute_transition_scores in generation/utils.py (#28786)
-
Yih-Dar authored
* update / add new workflow files * Add comment * Use env.NUM_SLICES * use scripts * use scripts * use scripts * Fix * using one script * Fix * remove unused file * update * fail-fast: false * remove unused file * fix * fix * use matrix * inputs * style * update * fix * fix * no model name * add doc * allow args * style * pass argument --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
* avoid using job name * apply to other files --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Joao Gante authored
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
-
Kian Sierra McGettigan authored
* direct copy from llama work * mistral modules forward pass working * flax mistral forward pass with sliding window * added tests * added layer collection approach * Revert "added layer collection approach" This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f. * Revert "Revert "added layer collection approach"" This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43. * fixed attention outputs * added mistral to init and auto * fixed import name * fixed layernorm weight dtype * freeze initialized weights * make sure conversion consideres bfloat16 * added backend * added docstrings * added cache * fixed sliding window causal mask * passes cache tests * passed all tests * applied make style * removed commented out code * applied fix-copies ignored other model changes * applied make fix-copies * removed unused functions * passed generation integration test * slow tests pass * fixed slow tests * changed default dtype from jax.numpy.float32 to float32 for docstring check * skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids * updated checkpoint since from_pt not included * applied black style * removed unused args * Applied styling and fixup * changed checkpoint for doc back * fixed rf after adding it to hf hub * Add dummy ckpt * applied styling * added tokenizer to new ckpt * fixed slice format * fix init and slice * changed ref for placeholder TODO * added copies from Llama * applied styling * applied fix-copies * fixed docs * update weight dtype reconversion for sharded weights * removed Nullable input ids * Removed unnecessary output attentions in Module * added embedding weight initialziation * removed unused past_key_values * fixed deterministic * Fixed RMS Norm and added copied from * removed input_embeds * applied make style * removed nullable input ids from sequence classification model * added copied from GPTJ * added copied from Llama on FlaxMistralDecoderLayer * added copied from to FlaxMistralPreTrainedModel methods * fix test deprecation warning * freeze gpt neox random_params and fix copies * applied make style * fixed doc issue * skipped docstring test to allign # copied from * applied make style * removed FlaxMistralForSequenceClassification * removed unused padding_idx * removed more sequence classification * removed sequence classification * applied styling and consistency * added copied from in tests * removed sequence classification test logic * applied styling * applied make style * removed freeze and fixed copies * undo test change * changed repeat_kv to tile * fixed to key value groups * updated copyright year * split casual_mask * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest * went back to 2023 for tests_pr_documentation_tests * went back to 2024 * changed tile to repeat * applied make style * empty for retry on Wav2Vec2
-
Matt authored
* Shim the Keras methods to support BatchEncoding * Extract everything to a convert_batch_encoding function * Convert BatchFeature too (thanks Amy) * tf.keras -> keras
-
Julien Chaumond authored
* canonical repos moves * Style --------- Co-authored-by:Lysandre <lysandre@huggingface.co>
-
Hieu Lam authored
* fix: resolve deepspeed resume peft model issues * chore: update something * chore: update model instance pass into is peft model checks * chore: remove hard code value to tests * fix: format code
-
Patrick von Platen authored
* up * Fix more * Correct more * Fix more tests * fix fast tests * Fix more * fix more * push all files * finish all * make style * Fix timestamp wrap * make style * make style * up * up * up * Fix lang detection behavior * Fix lang detection behavior * Add lang detection test * Fix lang detection behavior * make style * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by:
Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * better error message * make style tests * add warning --------- Co-authored-by:
Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
-
Younes Belkada authored
remove `check_packages_compatibility` logic
-
tom-p-reichel authored
* test that tied output embeddings aren't initialized on load * don't initialize the output embeddings if we're going to tie them to the input embeddings
-
Alessio Serra authored
Modified MLflow logging metrics from synchronous to asynchronous Co-authored-by:codiceSpaghetti <alessio.ser@hotmail.it>
-
Younes Belkada authored
fix bnb slow tests
-
- 30 Jan, 2024 4 commits
-
-
Matt authored
* Pin torch to <2.2.0 * Pin torchvision and torchaudio as well * Playing around with versions to see if this helps * twiddle something to restart the CI * twiddle it back * Try changing the natten version * make fixup * Revert "Try changing the natten version" This reverts commit de0d6592c35dc39ae8b5a616c27285db28262d06. * make fixup * fix fix fix * fix fix fix --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Matt authored
* Port core files + ESM (because ESM code is odd) * Search-replace in modelling code * Fix up transfo_xl as well * Fix other core files + tests (still need to add correct import to tests) * Fix cookiecutter * make fixup, fix imports in some more core files * Auto-add imports to tests * Cleanup, add imports to sagemaker tests * Use correct exception for importing tf_keras * Fixes in modeling_tf_utils * make fixup * Correct version parsing code * Ensure the pipeline tests correctly revert to float32 after each test * Ensure the pipeline tests correctly revert to float32 after each test * More tf.keras -> keras * Add dtype cast * Better imports of tf_keras * Add a cast for tf.assign, just in case * Fix callback imports
-
amyeroberts authored
* Abstract out pipeline init args * Address PR comments * Reword * BC PIPELINE_INIT_ARGS * Remove old arguments * Small fix
-
amyeroberts authored
* Enable instantiating model with pretrained backbone weights * Remove doc updates until changes made in modeling code * Use load_backbone instead * Add use_timm_backbone to the model configs * Add missing imports and arguments * Update docstrings * Make sure test is properly configured * Include recent DPT updates
-