- 11 Jul, 2022 8 commits
-
-
Younes Belkada authored
* fix tolerance for a bloom slow test * enhance alibi padding - get rid of for loops - deals better with padded batched input - avoid useless cpu/gpu communication when creating alibi Co-authored-by:
justheuristic <justheuristic@gmail.com> * optimize attention mask * fix scaled softmax limit values * optimize building alibi tensor Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * fix attention_mask shape when it's None * minor fixes - fix docstring + arg names * remove colons in docstring * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * apply suggestion * remove unsued arg * refactor a bit - use [:, None] for consistency * refactor attention block Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com> * quick fixes * first attempt * refactor attention block and fix all tests except "test_simple_generation" - added comments to better explain attention block * remove debug lines and add TODO comment * change `torch.bmm` to `torch.baddbmm` - fixes `test_simple_generation`but breaks `test_batch_generation_padd` * styling * all tests are passing now - use `bmm` - add explanation for `allow_fp16_reduced_precision_reduction` Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * styling Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * fix support for accelerate Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove attn softmax in fp32 * refactor comments * refactor a bit - remove warning message - remove print on test * refer to pytorch t5 * change the slow tests - do the tests in fp32 - remove some comments - keep large comments * update expected output for `test_simple_generation` - we now test using fp32 * make style + change comments a bit * fix dtype padd test Co-authored-by:
justheuristic <justheuristic@gmail.com> Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com> Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Duong A. Nguyen authored
* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts * using np.permutation for creating batch_idx * train_samples_idx -> training_samples_idx * fix type hints
-
Yih-Dar authored
* fix dtype issue in _attn * fix RotaryEmbedding * fix RotaryEmbedding 2 * clean up Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yulv-git authored
* Fix some typos. Signed-off-by:
Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by:
Yulv-git <yulvchi@qq.com> * make fixup.
-
- 10 Jul, 2022 1 commit
-
-
Stas Bekman authored
-
- 08 Jul, 2022 4 commits
-
-
neverix authored
* Make Trainer.predict call on_evaluate (#17952) * Add on_predict * Small fix * Small and different fix * Add tests
-
Sylvain Gugger authored
-
BOSEOP KIM authored
* Fix type issue in using bucketing with Trainer - Fix type issues in LengthGrouperSampler, DistributedLengthGroupedSampler refs: #18003 * Change logging type in LengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Change logging type in DistributedLengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in LengthGroupedSampler - Use `elif` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in DistributedLengthGroupedSampler - Use `elif` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply black, isort to modified codes in the script Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Fix slow CI by pinning resampy * Actually put it in the speech dependencies
-
- 07 Jul, 2022 5 commits
-
-
Matt authored
* Drop columns after loading samples, rather than before, to avoid breaking transforms * make fixup * Add workaround so this PR can work with current datasets version
-
Patrick von Platen authored
-
varshith authored
* Added command for windows VENV activation * changed linux and macos specification
-
Sylvain Gugger authored
* Add script to sort doc ToC * Style and fixes * Add check to quality job
-
Sylvain Gugger authored
-
- 06 Jul, 2022 6 commits
-
-
Sylvain Gugger authored
-
Sylvain Gugger authored
* Link to the Datasets doc * Remove unwanted file
-
Matt authored
-
Joao Gante authored
-
ADAning authored
* Add ALL_LAYERNORM_LAYERS for LayerNorm * fix bug of appending layer norm
-
NielsRogge authored
Co-authored-by:Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
-
- 05 Jul, 2022 4 commits
-
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Matt authored
-
Sanchit Gandhi authored
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 04 Jul, 2022 10 commits
-
-
Joao Gante authored
* get the right slicing index for position_bias
-
Sreyan Ghosh authored
Co-authored-by:Sreyan-G@NVIDIA <sreyang@nvidia.com>
-
Matt authored
* Return scalar losses instead of per-sample means * Make loss shape (1,) instead of scalar * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Remove XLA loss function for RAG
-
Matthijs Hollemans authored
-
regisss authored
-
regisss authored
-
amyeroberts authored
* Refactor to inherit from nn.Module instead of nn.ModuleList * Fix typo * Empty to trigger CI re-run Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)
-
amyeroberts authored
* Rought TF conversion outline * Tidy up * Fix padding differences between layers * Add back embedder - whoops * Match test file to main * Match upstream test file * Correctly pass and assign image_size parameter Co-authored-by:
Sayak Paul <spsayakpaul@gmail.com> * Add in MainLayer * Correctly name layer * Tidy up AdaptivePooler * Small tidy-up More accurate type hints and remove whitespaces * Change AdaptiveAvgPool Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509 Co-authored-by:
From: matt <rocketknight1@gmail.com> Co-authored-by:
Sayak Paul <spsayakpaul@gmail.com> * Use updated AdaptiveAvgPool Co-authored-by:
matt <rocketknight1@gmail.com> * Make AdaptiveAvgPool compatible with CPU * Remove image_size from configuration * Fixup * Tensorflow -> TensorFlow * Fix pt references in tests * Apply suggestions from code review - grammar and wording Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Add TFResNet to doc tests * PR comments - GlobalAveragePooling and clearer comments * Remove unused import * Add in keepdims argument * Add num_channels check * grammar fix: by -> of Co-authored-by:
matt <rocketknight1@gmail.com> Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com> * Remove transposes - keep NHWC throughout forward pass * Fixup look sharp * Add missing layer names * Final tidy up - remove from_pt now weights on hub Co-authored-by:
Sayak Paul <spsayakpaul@gmail.com> Co-authored-by:
matt <rocketknight1@gmail.com> Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com>
-
Lysandre Debut authored
-
Dobatymo authored
-
- 01 Jul, 2022 2 commits
-
-
David Heryanto authored
* Exclude Databricks from notebook env only if the runtime is below 11.0 * Dummy commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI
-
seungeunrho authored
* Shifting labels for causal LM when using label smoother When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves #17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py * Update trainer.py * Update src/transformers/trainer.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-