- 18 Jul, 2022 7 commits
-
-
Yih-Dar authored
Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
* fix expected loss values Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Wang, Yi authored
* [HPO] update to sigopt new experiment api * follow https://docs.sigopt.com/experiments Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * [HPO] use new API if sigopt version >= 8.0.0 Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com>
-
gcheron authored
Co-authored-by:Guilhem Ch茅ron <guilhemc@authentifier.com>
-
Lysandre Debut authored
* NLLB tokenizer * Apply suggestions from code review - Thanks Stefan! Co-authored-by:
Stefan Schweter <stefan@schweter.it> * Final touches * Style :) * Update docs/source/en/model_doc/nllb.mdx Co-authored-by:
Stefan Schweter <stefan@schweter.it> * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * PR reviews * Auto models Co-authored-by:
Stefan Schweter <stefan@schweter.it> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
John Giorgi authored
-
John Giorgi authored
-
- 15 Jul, 2022 2 commits
-
-
Nicolas Patry authored
* Adding support for `device_map` directly in `pipeline(..)` function. * Updating the docstring. * Adding a better docstring * Put back type hints. * Blacked. (`make fixup` didn't work ??!!)
-
Nicolas Patry authored
* Fixing a bug where attention mask was not passed to generate. * Fixing zero-size prompts. * Comment on top.
-
- 13 Jul, 2022 9 commits
-
-
amyeroberts authored
* Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models #17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Amy Roberts <amyeroberts@users.noreply.github.com>
-
Wei authored
* enable fx2trt * Update perf_train_gpu_one.mdx * Update perf_train_gpu_one.mdx * add lib check * update * format * update * fix import check * fix isort * improve doc * refactor ctx manager * fix isort * black format * isort fix * fix format * update args * update black * cleanups * Update perf_train_gpu_one.mdx * code refactor * code refactor to init * remove redundancy * isort * replace self.args with args Co-authored-by:Stas Bekman <stas@stason.org>
-
Sylvain Gugger authored
* Make sharded checkpoints work in offline mode * Add test
-
Sylvain Gugger authored
This reverts commit 3564c657.
-
Sylvain Gugger authored
-
lmagne authored
* added metadata to training summary * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
John Giorgi authored
* Add summarization name mapping for MultiNews * Add summarization name mapping for MultiNews
-
Sebastian Sosa authored
* supported python versions reference * Update CONTRIBUTING.md removing commit hash from link Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Joao Gante authored
-
- 12 Jul, 2022 6 commits
-
-
Joao Gante authored
-
Niklas Muennighoff authored
* Add fp16 option * Fix BLOOM dtype * Formatting * Remove torch_dtype arg * Revert formatting * Apply formatting * Add n_embed backward compat
-
Joao Gante authored
-
wei zhao authored
* Report value for a step instead of epoch. Report an objective function value for a step instead of epoch to optuna. I made this modification for the following reason: If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here. * MOD: make style. Co-authored-by:zhaowei01 <zhaowei01@yuanfudao.com>
-
Sijun He authored
-
jianan-gu authored
* enhance ipex import * refine codes * refine style * add link * style Co-authored-by:Stas Bekman <stas@stason.org>
-
- 11 Jul, 2022 8 commits
-
-
Younes Belkada authored
* fix tolerance for a bloom slow test * enhance alibi padding - get rid of for loops - deals better with padded batched input - avoid useless cpu/gpu communication when creating alibi Co-authored-by:
justheuristic <justheuristic@gmail.com> * optimize attention mask * fix scaled softmax limit values * optimize building alibi tensor Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * fix attention_mask shape when it's None * minor fixes - fix docstring + arg names * remove colons in docstring * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * apply suggestion * remove unsued arg * refactor a bit - use [:, None] for consistency * refactor attention block Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com> * quick fixes * first attempt * refactor attention block and fix all tests except "test_simple_generation" - added comments to better explain attention block * remove debug lines and add TODO comment * change `torch.bmm` to `torch.baddbmm` - fixes `test_simple_generation`but breaks `test_batch_generation_padd` * styling * all tests are passing now - use `bmm` - add explanation for `allow_fp16_reduced_precision_reduction` Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * styling Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * fix support for accelerate Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove attn softmax in fp32 * refactor comments * refactor a bit - remove warning message - remove print on test * refer to pytorch t5 * change the slow tests - do the tests in fp32 - remove some comments - keep large comments * update expected output for `test_simple_generation` - we now test using fp32 * make style + change comments a bit * fix dtype padd test Co-authored-by:
justheuristic <justheuristic@gmail.com> Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com> Co-authored-by:
Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Duong A. Nguyen authored
* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts * using np.permutation for creating batch_idx * train_samples_idx -> training_samples_idx * fix type hints
-
Yih-Dar authored
* fix dtype issue in _attn * fix RotaryEmbedding * fix RotaryEmbedding 2 * clean up Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yulv-git authored
* Fix some typos. Signed-off-by:
Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by:
Yulv-git <yulvchi@qq.com> * make fixup.
-
- 10 Jul, 2022 1 commit
-
-
Stas Bekman authored
-
- 08 Jul, 2022 4 commits
-
-
neverix authored
* Make Trainer.predict call on_evaluate (#17952) * Add on_predict * Small fix * Small and different fix * Add tests
-
Sylvain Gugger authored
-
BOSEOP KIM authored
* Fix type issue in using bucketing with Trainer - Fix type issues in LengthGrouperSampler, DistributedLengthGroupedSampler refs: #18003 * Change logging type in LengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Change logging type in DistributedLengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in LengthGroupedSampler - Use `elif` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove adundant clause in DistributedLengthGroupedSampler - Use `elif` Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply black, isort to modified codes in the script Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Fix slow CI by pinning resampy * Actually put it in the speech dependencies
-
- 07 Jul, 2022 3 commits
-
-
Matt authored
* Drop columns after loading samples, rather than before, to avoid breaking transforms * make fixup * Add workaround so this PR can work with current datasets version
-
Patrick von Platen authored
-
varshith authored
* Added command for windows VENV activation * changed linux and macos specification
-