- 14 Jun, 2021 2 commits
-
-
Patrick von Platen authored
* fix_torch_device_generate_test * remove @ * upload * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Update examples/flax/language-modeling/README.md * add more info * finish * fix Co-authored-by:Patrick von Platen <patrick@huggingface.co>
-
Guido Novati authored
* Fix megatron_gpt2 attention block's causal mask. * compatibility with checkpoints created with recent versions of Megatron-LM * added integration test for the released Megatron-GPT2 model * code style changes * added option to megatron conversion script to read from config file Co-authored-by:Guido Novati <gnovati@nvidia.com>
-
- 13 Jun, 2021 1 commit
-
-
Jonathan Chang authored
* Fix t5 error message * Fix again
-
- 11 Jun, 2021 3 commits
-
-
Lysandre Debut authored
* Add from_pretrained to dummy timm * Fix at the source * Update utils/check_dummies.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Missing pretrained dummies * Style Co-authored-by:
Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Suraj Patil authored
* first draft * max_seq_length => block_size * fix arg names * fix typos * fix loss calculation * add max examples, fix train eval steps, metrics * optimizer mask * fix perpelexity, metric logging * fix logging * data_collator = > data_loader * refactor loss_fn * support single GPU * pass distributed to write_metric * fix jitting * fix single device training * fix single device metrics * close inner progress bars once finished * add overwrite_cache arg * ifx dataset caching issue * add more logs * few small fixes, * address nicholas suggestions * fix docstr * address patricks suggestions * make flake happy * pass new new_dropout_rng to apply_gradients * reset train metrics after every epoc * remove distributed logis, small fixes
-
Patrick von Platen authored
* fix_torch_device_generate_test * remove @ * fix tests
-
- 10 Jun, 2021 10 commits
-
-
Bhavitvya Malik authored
* add relevant `desc` in examples * require_version datasets>=1.8.0
-
Jayendra authored
* adding vit for flax * added test for Flax-vit and some bug-fixes * overrided methods where variable changes were necessary for flax_vit test * added FlaxViTForImageClassification for test * Update src/transformers/models/vit/modeling_flax_vit.py Co-authored-by:
Suraj Patil <surajp815@gmail.com> * made changes suggested in PR * Adding jax-vit models for autoimport * swapping num_channels and height,width dimension * fixing the docstring for torch-like inputs for VIT * add model to main init * add docs * doc, fix-copies * docstrings * small test fixes * fix docs * fix docstr * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * style Co-authored-by:
jayendra <jayendra@infocusp.in> Co-authored-by:
Suraj Patil <surajp815@gmail.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
Daniel Stancl authored
* Fix a condition in test_generate_with_head_masking * Fix usage of head_mask in bigbirg_pegasus * Fix head masking for speech2text * Resolve copy mismatch + drop unwanted print statement * Fix the condition
-
Matt authored
-
Matt authored
-
Matt authored
-
Sylvain Gugger authored
-
Matt authored
* Pushing partially-complete new GLUE example * First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready. * Fix to the fit() call * Bugfixes, making sure TPU and multi-GPU support is ready * Remove logger line that depends on Pytorch * Style pass * Deleting old TF GLUE example * Include label2id and id2label in the saved model config * Don't clobber the existing model.config.label2id * Style fixes * Update examples/tensorflow/text-classification/run_glue.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Tobias Norlund authored
* Resize with kept aspect ratio * Fixed failed test * Overload center_crop and resize methods instead * resize should handle non-PIL images * update slow test * Tensor => tensor Co-authored-by:patil-suraj <surajp815@gmail.com>
-
kumapo authored
* Add text_column_name and label_column_name to run_ner args * Minor fix: grouping for text and label column name
-
- 09 Jun, 2021 8 commits
-
-
Patrick von Platen authored
* fix_torch_device_generate_test * remove @ * fix tests
-
Stas Bekman authored
-
Suraj Patil authored
-
Anton Lozhkov authored
* Working quantizer forward * Working quantizer forward * Clean up unused model parts, test reproducibility * Working quantizer forward * Clean up unused model parts, test reproducibility * Remove custom outputs from the shared ones * correct conversion * correct bug * add first pretrain script * save intermediate * static shapes * save intermediate * finish first pretrain script version * more refactor * remove wanddb * refactor more * improve test * correct perplexity compute bug * finish model implementation * add to docs * finish docs * finish pretraining script * finish pretraining script * remove wandb * finish PR for merge * finish config * finish * make deepspeed work * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply suggestions * fix flaky test Co-authored-by:
patrickvonplaten <patrick.v.platen@gmail.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
* support more than 2 gpus * style
-
NielsRogge authored
* Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to tests茅 * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by:Lysandre <lysandre.debut@reseau.eseo.fr>
-
Stas Bekman authored
-
Koichi Yasuoka authored
-
- 08 Jun, 2021 11 commits
-
-
Stas Bekman authored
-
Stas Bekman authored
* wip * wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044 * cleanup * workaround * working 5/8 modes * solve fp32 distributed zero3 * style * sync * sync * rework * deprecation * cleanup * https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged * clean up * add a guide * more prose * more prose * fix * more prose * sub_group_size was too big * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refactor * bug fix * make the true check explicit * new deepspeed release Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
* replace deprecated config * sub_group_size was too big * complete deprecation removal
-
Sylvain Gugger authored
-
cdleong authored
* Add torch to requirements.txt in language-modeling * Update examples/pytorch/language-modeling/requirements.txt Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Mario 艩a拧ko authored
* Replace legacy torch.Tensor constructor with torch.{tensor, empty} * Remove torch.Tensor in examples -
Shamane Siri authored
* updated the original RAG implementation to be compatible with the latest PL version * updated the requirements.txt file * execute make style * code quality test * code quality * conflix resolved in requirement.txt * code quality * changed the MyDDP class name to CustomDDP
-
NielsRogge authored
* Fix scatter function to be compatible with torch-scatter 2.7.0 * Allow test again
-
NielsRogge authored
-
Stas Bekman authored
-
Russell Klopfer authored
* adds metric prefix. * update tests to include prefix
-
- 07 Jun, 2021 5 commits
-
-
Peter Izsak authored
* Adding optional argument group to HfArgumentParser * Minor * remove whitespace * Minor styling
-
Nicolas Patry authored
* fix_torch_device_generate_test * remove @ * finish * refactor * add test * fix test * Attempt at simplification. * Small fix. * Fixing non existing AutoModel for TF. * Naming. * Remove extra condition. Co-authored-by:patrickvonplaten <patrick.v.platen@gmail.com>
-
Fran莽ois Lagunas authored
* Fixing bug that appears when using distilation (and potentially other uses). During backward pass Pytorch complains with: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails. * Fixing all models QA clamp_ bug.
-
Patrick von Platen authored
* fix_torch_device_generate_test * remove @ * bump up jax lib
-
Suraj Patil authored
-