- 13 Jul, 2022 2 commits
-
-
Sebastian Sosa authored
* supported python versions reference * Update CONTRIBUTING.md removing commit hash from link Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Joao Gante authored
-
- 12 Jul, 2022 6 commits
-
-
Joao Gante authored
-
Niklas Muennighoff authored
* Add fp16 option * Fix BLOOM dtype * Formatting * Remove torch_dtype arg * Revert formatting * Apply formatting * Add n_embed backward compat
-
Joao Gante authored
-
wei zhao authored
* Report value for a step instead of epoch. Report an objective function value for a step instead of epoch to optuna. I made this modification for the following reason: If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here. * MOD: make style. Co-authored-by:zhaowei01 <zhaowei01@yuanfudao.com>
-
Sijun He authored
-
jianan-gu authored
* enhance ipex import * refine codes * refine style * add link * style Co-authored-by:Stas Bekman <stas@stason.org>
-
- 11 Jul, 2022 8 commits
-
-
Younes Belkada authored
* fix tolerance for a bloom slow test * enhance alibi padding - get rid of for loops - deals better with padded batched input - avoid useless cpu/gpu communication when creating alibi Co-authored-by:
justheuristic <justheuristic@gmail.com> * optimize attention mask * fix scaled softmax limit values * optimize building alibi tensor Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * fix attention_mask shape when it's None * minor fixes - fix docstring + arg names * remove colons in docstring * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * apply suggestion * remove unsued arg * refactor a bit - use [:, None] for consistency * refactor attention block Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com> * quick fixes * first attempt * refactor attention block and fix all tests except "test_simple_generation" - added comments to better explain attention block * remove debug lines and add TODO comment * change `torch.bmm` to `torch.baddbmm` - fixes `test_simple_generation`but breaks `test_batch_generation_padd` * styling * all tests are passing now - use `bmm` - add explanation for `allow_fp16_reduced_precision_reduction` Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * styling Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * fix support for accelerate Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove attn softmax in fp32 * refactor comments * refactor a bit - remove warning message - remove print on test * refer to pytorch t5 * change the slow tests - do the tests in fp32 - remove some comments - keep large comments * update expected output for `test_simple_generation` - we now test using fp32 * make style + change comments a bit * fix dtype padd test Co-authored-by:
justheuristic <justheuristic@gmail.com> Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com> Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Duong A. Nguyen authored
* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts * using np.permutation for creating batch_idx * train_samples_idx -> training_samples_idx * fix type hints
-
Yih-Dar authored
* fix dtype issue in _attn * fix RotaryEmbedding * fix RotaryEmbedding 2 * clean up Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yulv-git authored
* Fix some typos. Signed-off-by:
Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by:
Yulv-git <yulvchi@qq.com> * make fixup.
-
- 10 Jul, 2022 1 commit
-
-
Stas Bekman authored
-
- 08 Jul, 2022 4 commits
-
-
neverix authored
* Make Trainer.predict call on_evaluate (#17952) * Add on_predict * Small fix * Small and different fix * Add tests
-
Sylvain Gugger authored
-
BOSEOP KIM authored
* Fix type issue in using bucketing with Trainer - Fix type issues in LengthGrouperSampler, DistributedLengthGroupedSampler refs: #18003 * Change logging type in LengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Change logging type in DistributedLengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in LengthGroupedSampler - Use `elif` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in DistributedLengthGroupedSampler - Use `elif` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply black, isort to modified codes in the script Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Fix slow CI by pinning resampy * Actually put it in the speech dependencies
-
- 07 Jul, 2022 5 commits
-
-
Matt authored
* Drop columns after loading samples, rather than before, to avoid breaking transforms * make fixup * Add workaround so this PR can work with current datasets version
-
Patrick von Platen authored
-
varshith authored
* Added command for windows VENV activation * changed linux and macos specification
-
Sylvain Gugger authored
* Add script to sort doc ToC * Style and fixes * Add check to quality job
-
Sylvain Gugger authored
-
- 06 Jul, 2022 6 commits
-
-
Sylvain Gugger authored
-
Sylvain Gugger authored
* Link to the Datasets doc * Remove unwanted file
-
Matt authored
-
Joao Gante authored
-
ADAning authored
* Add ALL_LAYERNORM_LAYERS for LayerNorm * fix bug of appending layer norm
-
NielsRogge authored
Co-authored-by:Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
-
- 05 Jul, 2022 4 commits
-
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Matt authored
-
Sanchit Gandhi authored
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 04 Jul, 2022 4 commits
-
-
Joao Gante authored
* get the right slicing index for position_bias
-
Sreyan Ghosh authored
Co-authored-by:Sreyan-G@NVIDIA <sreyang@nvidia.com>
-
Matt authored
* Return scalar losses instead of per-sample means * Make loss shape (1,) instead of scalar * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Remove XLA loss function for RAG
-
Matthijs Hollemans authored
-