• Arthur's avatar
    [WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242
    Arthur authored
    * Initial commit
    
    * update modeling code
    
    * update doc
    
    * add functions necessary
    
    * fix impotrs
    
    * revert changes
    
    * fixup
    
    * more styling to get going
    
    * remove standalone encoder
    
    * update code
    
    * styling
    
    * fix config and model
    
    * update code and some refactoring
    
    * make more tests pass
    
    * Adding NLLB-200 - MoE - 54.5B for no language left behind
    Fixes #21300
    
    * fix mor common tests
    
    * styke
    
    * update testing file
    
    * update
    
    * update
    
    * Router2 doc
    
    * update check config with sparse layer
    
    * add dummy router
    
    * update current conversion script
    
    * create on the fly conversion script
    
    * Fixup
    
    * style
    
    * style 2
    
    * fix empty return
    
    * fix return
    
    * Update default config sparse layers
    
    * easier to create sparse layers
    
    * update
    
    * update conversion script
    
    * update modeling
    
    * add to toctree
    
    * styling
    
    * make ruff happy
    
    * update docstring
    
    * update conversion script
    
    * update, will break tests but impelemting top2
    
    * update
    
    * local groups are supported here
    
    * ️ Support for local groups is now removed ️
    
    This is because it has to work with model parallelism that we do not support
    
    * finish simplificaiton
    
    * Fix forward
    
    * style
    
    * fixup
    
    * Update modelling and test, refactoring
    
    * update tests
    
    * remove final layer)norm as it is done in the FF
    
    * routing works! Logits test added
    
    * nit in test
    
    * remove top1router
    
    * style
    
    * make sure sparse are tested. Had to change route_tokens a liottle bit
    
    * add support for unslip models when converting
    
    * fixup
    
    * style
    
    * update test s
    
    * update test
    
    * REFACTOR
    
    * encoder outputs match!
    
    * style
    
    * update testing
    
    * 🎉encoder and decoder logits match 🎉
    
    
    
    * styleing
    
    * update tests
    
    * cleanup tests
    
    * fix router test and CIs
    
    * cleanup
    
    * cleanup test styling
    
    * fix tests
    
    * Finally the generation tests match!
    
    * cleanup
    
    * update test
    
    * style testing file
    
    * remove script
    
    * cleanup
    
    * more cleanup
    
    * nits
    
    * update
    
    * NLLB tokenizer is wrong and will be fixed soon
    
    * use LongTensors
    
    * update tests
    
    * revert some small changes
    
    * fix second expert sampling and batch prioritized routing
    
    * update tests
    
    * finish last tests
    
    * make ruff happy
    
    * update
    
    * ruff again
    
    * style
    
    * Update docs/source/en/model_doc/nllb-moe.mdx
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Updates based on review
    
    * style and fix import issue
    
    * nit
    
    * more nits
    
    * cleanup
    
    * styling
    
    * update test_seconde_expert_policy
    
    * fix name
    
    * last nit on the markdown examples
    
    ---------
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    19ade242
README_hd.md 134 KB