"start.sh" did not exist on "57a7272c675bf72ee8175b65c4e7ffa2e4d2cb2c"
  • Ariel Ekgren's avatar
    Add gpt-sw3 model to transformers (#20209) · 5f94855d
    Ariel Ekgren authored
    
    
    * Add templates for gpt-sw3
    
    * Add templates for gpt-sw3
    
    * Added sentencepiece tokenizer
    
    * intermediate commit with many changes
    
    * fixed conflicts
    
    * Init commit for tokenization port
    
    * Tokenization progress
    
    * Remove fast tokenizer
    
    * Clean up and rename spm.model -> spiece.model
    
    * Remove TF -> PT conversion script template, Clean up Megatron -> PT script
    
    * Optimize encode & decode performance
    
    * added new attention
    
    * added new attention
    
    * attention for gpt-sw3 working
    
    * attention good
    
    * Cache is now working
    
    * fixed attention mask so that it works with causal attention
    
    * fixed badbmm bug for cpu and caching
    
    * updated config with correct parameters
    
    * Refactor and leave optimizations as separate functions to avoid breaking expected functionality
    
    * Fix special tokens mapping for both tokenizers
    
    * cleaning up of code and comments
    
    * HF compatible attention outputs
    
    * Tokenizer now passing tests, add documentation
    
    * Update documentation
    
    * reverted back to base implementation after checking that it is identical to pretrained model
    
    * updated gpt-sw3 config
    
    * updated conversion script
    
    * aligned parameters with gpt-sw3 config
    
    * changed default scale_attn_by_inverse_layer_idx to true
    
    * removed flag from conversion script
    
    * added temporary model path
    
    * reverted back to functioning convert script
    
    * small changes to default config
    
    * updated tests for gpt-sw3
    
    * make style, make quality, minor cleanup
    
    * Change local paths to testing online repository
    
    * Change name: GptSw3 -> GPTSw3
    
    * Remove GPTSw3TokenizerFast references
    
    * Use official model repository and add more model sizes
    
    * Added reference to 6.7b model
    
    * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel
    
    * Remove pointers to non-existing TFGPTSw3
    
    * Add GPTSw3 to docs/_toctree.yml
    
    * Remove TF artifacts from GPTSw3 in __init__ files
    
    * Update README:s with 'make fix-copies'
    
    * Add 20b model to archive list
    
    * Add documentation for GPT-Sw3
    
    * Fix typo in documentation for GPT-Sw3
    
    * Do 'make fix-copies' again after having updated docs
    
    * Fix some typos in docs
    
    * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/__init__.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/__init__.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Resolve comments from PR feedback
    
    * Resolve more comments from PR feedback, also set use_cache=True in convert script
    
    * Add '# Copied from' comments for GPTSw3 modeling
    
    * Set 'is_parallelizable = False'
    
    * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate
    
    * Remove parallelize in mdx
    
    * make style, make quality
    
    * Update GPTSw3Config default values and corresponding documentation
    
    * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/__init__.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available
    
    * Make style, make quality
    
    * Add dummy object for GPTSw3Tokenizer via 'make fix-copies'
    
    * make fix-copies
    
    * Remove GPTSw3 modeling classes
    
    * make style, make quality
    
    * Add GPTSw3 auto-mappings for other GPT2 heads
    
    * Update docs/source/en/model_doc/gpt-sw3.mdx
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Remove old TODO-comment
    
    * Add example usage to GPTSw3Tokenizer docstring
    
    * make style, make quality
    
    * Add implementation details and example usage to gpt-sw3.mdx
    Co-authored-by: default avatarJoeyOhman <joeyoh@kth.se>
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    5f94855d
index.mdx 69.7 KB