• Matt's avatar
    Rebase ESM PR and update all file formats (#19055) · 368b649a
    Matt authored
    
    
    * Rebase ESM PR and update all file formats
    
    * Fix test relative imports
    
    * Add __init__.py to the test dir
    
    * Disable gradient checkpointing
    
    * Remove references to TFESM... FOR NOW >:|
    
    * Remove completed TODOs from tests
    
    * Convert docstrings to mdx, fix-copies from BERT
    
    * fix-copies for the README and index
    
    * Update ESM's __init__.py to the modern format
    
    * Add to _toctree.yml
    
    * Ensure we correctly copy the pad_token_id from the original ESM model
    
    * Ensure we correctly copy the pad_token_id from the original ESM model
    
    * Tiny grammar nitpicks
    
    * Make the layer norm after embeddings an optional flag
    
    * Make the layer norm after embeddings an optional flag
    
    * Update the conversion script to handle other model classes
    
    * Remove token_type_ids entirely, fix attention_masking and add checks to convert_esm.py
    
    * Break the copied from link from BertModel.forward to remove token_type_ids
    
    * Remove debug array saves
    
    * Begin ESM-2 porting
    
    * Add a hacky workaround for the precision issue in original repo
    
    * Code cleanup
    
    * Remove unused checkpoint conversion code
    
    * Remove unused checkpoint conversion code
    
    * Fix copyright notices
    
    * Get rid of all references to the TF weights conversion
    
    * Remove token_type_ids from the tests
    
    * Fix test code
    
    * Update src/transformers/__init__.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/__init__.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update README.md
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Add credit
    
    * Remove _ args and __ kwargs in rotary embedding
    
    * Assertively remove asserts
    
    * Replace einsum with torch.outer()
    
    * Fix docstring formatting
    
    * Remove assertions in tokenization
    
    * Add paper citation to ESMModel docstring
    
    * Move vocab list to single line
    
    * Remove ESMLayer from init
    
    * Add Facebook copyrights
    
    * Clean up RotaryEmbedding docstring
    
    * Fix docstring formatting
    
    * Fix docstring for config object
    
    * Add explanation for new config methods
    
    * make fix-copies
    
    * Rename all the ESM- classes to Esm-
    
    * Update conversion script to allow pushing to hub
    
    * Update tests to point at my repo for now
    
    * Set config properly for tests
    
    * Remove the gross hack that forced loss of precision in inv_freq and instead copy the data from the model being converted
    
    * make fixup
    
    * Update expected values for slow tests
    
    * make fixup
    
    * Remove EsmForCausalLM for now
    
    * Remove EsmForCausalLM for now
    
    * Fix padding idx test
    
    * Updated README and docs with ESM-1b and ESM-2 separately (#19221)
    
    * Updated README and docs with ESM-1b and ESM-2 separately
    
    * Update READMEs, longer entry with 3 citations
    
    * make fix-copies
    Co-authored-by: default avatarYour Name <you@example.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarTom Sercu <tsercu@fb.com>
    Co-authored-by: default avatarYour Name <you@example.com>
    368b649a
esm.mdx 5.31 KB