- 27 Mar, 2023 13 commits
-
-
Joao Gante authored
missing None check
-
Joao Gante authored
-
NielsRogge authored
* First draft * Fix integration test * Remove script * Fix test and typos * Fix one more test * Skip tied embeddings test * Remove line * Address comments
-
Sylvain Gugger authored
* Report safetensors version in transformers-cli env * Styling * Trigger CI maybe
-
Younes Belkada authored
for rg to be `False`
-
Joao Gante authored
-
Nathan Fradet authored
* seq2seq trainer and training arguments accepting GenerationConfig arg * seq2seq Trainer and training arguments docstring fixes * Update training_args_seq2seq.py docstring Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Fixing trainer_seq2seq.py docstring Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * seq2seq trainer: legacy gen args back & GenerationConfig created at init * Seq2seq trainer: fix in case gen_config.max_new_tokens is None Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding legacy arg retrocompatibility * seq2seq trainer and training arguments accepting GenerationConfig arg * seq2seq Trainer and training arguments docstring fixes * Update training_args_seq2seq.py docstring Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Fixing trainer_seq2seq.py docstring Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * seq2seq trainer: legacy gen args back & GenerationConfig created at init * Seq2seq trainer: fix in case gen_config.max_new_tokens is None Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding legacy arg retrocompatibility * seq2seq trainer: evaluate and predict untouched * Apply suggestions from code review Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding init args, keeping IDEs hints --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Vladislav Sokolovskii authored
* Wav2Vec2ProcessorWithLM can return N best hypotheses now Signed-off-by:
Vladislav Sokolovskii <vladislav@parrothq.com> * Wav2Vec2ProcessorWithLM n_best cannot be None Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Batch decoding can return N best hypotheses now batch_decode was extended with the same functionality as decode function, N best hypotheses per sample can be returned Signed-off-by:
Vladislav Sokolovskii <vladislav@parrothq.com> --------- Signed-off-by:
Vladislav Sokolovskii <vladislav@parrothq.com> Co-authored-by:
Vladislav Sokolovskii <vladislav@parrothq.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
кѳѳsнī authored
balanced 8bit memory
-
Sylvain Gugger authored
-
Nicola Procopio authored
* updated toctree * added and translated mdx documents
-
Charlie-Bell authored
Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.
-
Joao Gante authored
* missing cmake * more cmake
-
- 24 Mar, 2023 12 commits
-
-
Stas Bekman authored
* [safetensors] don't use in pt<1.10 * better fix
-
Sylvain Gugger authored
-
Stas Bekman authored
-
Shubhamai authored
* [WIP] flax resnet * added pretrained flax models, results reproducible * Added pretrained flax models, results reproducible * working on tests * no real code change, just some comments * [flax] adding support for batch norm layers * fixing bugs related to pt+flax integration * removing loss from modeling flax output class * fixing classifier tests * fixing comments, model output * cleaning comments * review changes * review changes * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * renaming Flax to PyTorch --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Joao Gante authored
-
Samuel Bubán authored
* Improve error message * Fix consistency
-
Sylvain Gugger authored
* Pin tensorflow-text to go with tensorflow * Make it more convenient to pin TensorFlow * setup don't like f-strings
-
Yih-Dar authored
* update docker files to use official torch 2.0.0 --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Mitch Naylor authored
* add mega file structure and plain pytorch version of mega source code * added config class with old naming conventions * filled in mega documentation * added config class and embeddings with optional token types * updated notes * starting the conversion process, deleted intermediate and added use_cache back to config * renamed config attributes in modeling_mega.py * checkpointing before refactoring incremental decoding functions * removed stateful incremental key/values for EMA and self-attention * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention * bug fix in attention mask handling in MovingAverageGatedAttention * removed incremental state from GatedCrossAttention and removed IncrementalState class * finished gated cross attention and got MegaLayer working * fixed causal masking in mega decoder * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids * added optional dense hidden layer for masked and causal LM classes * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention * removed before_attn_fn in Mega class and updated docstrings and comments up to there * bug fix in MovingAverageGatedAttention masking * working conversion of MLM checkpoint in scratchpad script -- perfect matches * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint * finished checkpoint conversion script * cleanup old class in mega config script * removed 'copied from' statements and passing integration tests * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing * fixed tuple output of megamodel * all common tests passing after fixing issues in decoder, gradient retention, and initialization * added mega-specific tests, ready for more documentation and style checks * updated docstrings; checkpoint before style fixes * style and quality checks, fixed initialization problem in float_tensor, ready for PR * added mega to toctree * removed unnecessary arg in megaconfig * removed unused arg and fixed code samples with leftover roberta models * Apply suggestions from code review Applied all suggestions except the one renaming a class, as I'll need to update that througout Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights() * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files * variable names in NFFN * manual Mega->MEGA changes in docs * Mega->MEGA in config auto * style and quality fixes * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments * commit before dealing with merge conflicts * made new attention activation functions available in ACT2FN and added generation test from OPT * style and quality in activations and tests * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings * style and quality fixes after latest updates, before rotary position ids * causal mask in MegaBlock docstring + added missing device passing * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update README.md Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR * style and quality fixes + readme updates pointing to main --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Joao Gante authored
-
Ashwin Mathur authored
Fix typo in greedy search docs
-
James Reed authored
* [HFTracer] Make embeddings ops take on the dtype of the weight * fix bug
-
- 23 Mar, 2023 13 commits
-
-
Yih-Dar authored
* Automatically create or update tiny models * Skip failed tests * update workflow file * use revision --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
кѳѳsнī authored
* Llama - Move target tokens to final pipeline device if needed * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Joao Gante authored
-
jeffhataws authored
This PR fixes the "RuntimeError: No CUDA GPUs are available" when running with --bf16 option on Neuron. Related PRs: https://github.com/huggingface/transformers/pull/20684 https://github.com/huggingface/transformers/pull/22300
-
Batese2001 authored
* Added type hints to TFDeiTModel * make style --------- Co-authored-by:Matt <rocketknight1@gmail.com>
-
Samuel Larkin authored
-
Sylvain Gugger authored
* Fix various imports * Fix copies * Fix import
-
Quentin Lhoest authored
* Mention why one needs to specify max_steps in Trainer * dummy change to trigger CI
-
mollerup23 authored
* Fixed gradient checkpoint bug for this model * Updating PR indentation (maintainer feedback) * make fixup --------- Co-authored-by:younesbelkada <younesbelkada@gmail.com>
-
Younes Belkada authored
add `accelerate` support for MBart
-
Stas Bekman authored
* [gptj] support older pytorch version * contributor * contributor * make copies --------- Co-authored-by:
Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
Sylvain Gugger authored
-
Sylvain authored
-
- 22 Mar, 2023 2 commits
-
-
Stas Bekman authored
* [deepspeed zero3] need generate(synced_gpus=True, ...) * fix * rework per Sylvain's suggestion * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Yih-Dar authored
* check what tests fail * Skip failing tests * Skip failing tests * Skip failing tests * Skip failing tests * clean up * clean up --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-