1. 03 Apr, 2023 4 commits
    • Arthur's avatar
      Fix llama tokenizer (#22402) · c0f99b4d
      Arthur authored
      * draft
      
      * update tokenization limma and conversion script
      
      * more udpates
      
      * initial commit
      
      * style
      
      * default pad to None
      
      * draft tokenization tests
      
      * update test
      
      * update tokenization tests
      
      * nits
      
      * update
      
      * versioning test
      
      * major fix
      
      * fix more testst
      
      * finish fixing special masks
      
      * last nit
      
      * more nits
      
      * add encode decode tests
      
      * add more
      
      * fix token type ids
      
      * style
      c0f99b4d
    • Eli Simhayev's avatar
      [Time-Series] fix past_observed_mask type (#22076) · 9eae4aa5
      Eli Simhayev authored
      added > 0.5 to `past_observed_mask`
      9eae4aa5
    • amyeroberts's avatar
      Backbone add out indices (#22493) · 559a45d1
      amyeroberts authored
      * Add out_indices to backbones, deprecate out_features
      
      * Update - can specify both out_features and out_indices but not both
      
      * Can specify both
      
      * Fix copies
      
      * Add out_indices to convnextv2 configuration
      559a45d1
    • kevinpro's avatar
      Update convert_llama_weights_to_hf.py (#22525) · db803b69
      kevinpro authored
      db803b69
  2. 31 Mar, 2023 6 commits
  3. 30 Mar, 2023 10 commits
  4. 29 Mar, 2023 14 commits
  5. 28 Mar, 2023 4 commits
  6. 27 Mar, 2023 2 commits
    • Kshiteej K's avatar
      [neptune] fix checkpoint bug with relative out_dir (#22102) · 3ec7a476
      Kshiteej K authored
      
      
      * [neptune] fix checkpoint bug with relative out_dir
      
      * update imports
      
      * reformat with black
      
      * check neptune without imports
      
      * fix typing-related issue
      
      * run black on code
      
      * use os.path.sep instead of raw \
      
      * simplify imports and remove type annotation
      
      * make ruff happy
      
      * apply review suggestions
      
      ---------
      Co-authored-by: default avatarAleksander Wojnarowicz <alwojnarowicz@gmail.com>
      3ec7a476
    • Arthur's avatar
      [WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242
      Arthur authored
      * Initial commit
      
      * update modeling code
      
      * update doc
      
      * add functions necessary
      
      * fix impotrs
      
      * revert changes
      
      * fixup
      
      * more styling to get going
      
      * remove standalone encoder
      
      * update code
      
      * styling
      
      * fix config and model
      
      * update code and some refactoring
      
      * make more tests pass
      
      * Adding NLLB-200 - MoE - 54.5B for no language left behind
      Fixes #21300
      
      * fix mor common tests
      
      * styke
      
      * update testing file
      
      * update
      
      * update
      
      * Router2 doc
      
      * update check config with sparse layer
      
      * add dummy router
      
      * update current conversion script
      
      * create on the fly conversion script
      
      * Fixup
      
      * style
      
      * style 2
      
      * fix empty return
      
      * fix return
      
      * Update default config sparse layers
      
      * easier to create sparse layers
      
      * update
      
      * update conversion script
      
      * update modeling
      
      * add to toctree
      
      * styling
      
      * make ruff happy
      
      * update docstring
      
      * update conversion script
      
      * update, will break tests but impelemting top2
      
      * update
      
      * local groups are supported here
      
      * ️ Support for local groups is now removed ️
      
      This is because it has to work with model parallelism that we do not support
      
      * finish simplificaiton
      
      * Fix forward
      
      * style
      
      * fixup
      
      * Update modelling and test, refactoring
      
      * update tests
      
      * remove final layer)norm as it is done in the FF
      
      * routing works! Logits test added
      
      * nit in test
      
      * remove top1router
      
      * style
      
      * make sure sparse are tested. Had to change route_tokens a liottle bit
      
      * add support for unslip models when converting
      
      * fixup
      
      * style
      
      * update test s
      
      * update test
      
      * REFACTOR
      
      * encoder outputs match!
      
      * style
      
      * update testing
      
      * 🎉encoder and decoder logits match 🎉
      
      
      
      * styleing
      
      * update tests
      
      * cleanup tests
      
      * fix router test and CIs
      
      * cleanup
      
      * cleanup test styling
      
      * fix tests
      
      * Finally the generation tests match!
      
      * cleanup
      
      * update test
      
      * style testing file
      
      * remove script
      
      * cleanup
      
      * more cleanup
      
      * nits
      
      * update
      
      * NLLB tokenizer is wrong and will be fixed soon
      
      * use LongTensors
      
      * update tests
      
      * revert some small changes
      
      * fix second expert sampling and batch prioritized routing
      
      * update tests
      
      * finish last tests
      
      * make ruff happy
      
      * update
      
      * ruff again
      
      * style
      
      * Update docs/source/en/model_doc/nllb-moe.mdx
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Updates based on review
      
      * style and fix import issue
      
      * nit
      
      * more nits
      
      * cleanup
      
      * styling
      
      * update test_seconde_expert_policy
      
      * fix name
      
      * last nit on the markdown examples
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      19ade242