1. 06 Aug, 2024 7 commits
    • Ao Tang's avatar
      Add Nemotron HF Support (#31699) · 6a03942d
      Ao Tang authored
      * Add nemotron support
      
      * fix inference
      
      * add unit test
      
      * add layernorm1p as a class to avoid meta device mismatch
      
      * test fixed
      
      * Add copied_from statements
      
      * remove pretraining_tp args
      
      * remove nemotronlayernorm
      
      * force LN computation done in FP32
      
      * remove nemotrontokenizer and use llamatokenizer
      
      * license update
      
      * add option for kv_channels for minitron8b
      
      * remove assert
      
      * o_proj fixed
      
      * o_proj reshape
      
      * add gated_proj option
      
      * typo
      
      * remove todos
      
      * fix broken test after merging latest main
      
      * remove nezha/nat after meging main
      
      * chnage default config to 15b model
      
      * add nemo conversion script
      
      * rename conversion script
      
      * remove gate_proj option
      
      * pr comment resolved
      
      * fix unit test
      
      * rename kv_channels to head_dim
      
      * resolve PR issue
      
      * add nemotron md
      
      * fix broken tests
      
      * refactor rope for nemotron
      
      * test fix
      
      * remove linearscaling
      
      * whitespace and import
      
      * fix some copied-from
      
      * code style fix
      
      * reformatted
      
      * add position_embedding to nemotronattention
      
      * rope refactor to only use config, copied-from fix
      
      * format
      
      * Run make fix-copies
      
      * nemotron md with autodoc
      
      * doc  fix
      
      * fix order
      
      * pass check_config_docstrings.py
      
      * fix config_attributes
      
      * remove all llama BC related code
      
      * Use PreTrainedTokenizerFast
      
      * ruff check examples
      
      * conversion script update
      
      * add nemotron to toctree
      6a03942d
    • Joao Gante's avatar
      Dependencies: fix typo (#32389) · 36fd35e1
      Joao Gante authored
      deps_2
      36fd35e1
    • Francisco Kurucz's avatar
    • Pavel Iakubovskii's avatar
      Update kwargs validation for `preprocess` with decorator (#32024) · fb66ef81
      Pavel Iakubovskii authored
      * BLIP preprocess
      
      * BIT preprocess
      
      * BRIDGETOWER preprocess
      
      * CHAMELEON preprocess
      
      * CHINESE_CLIP preprocess
      
      * CONVNEXT preprocess
      
      * DEIT preprocess
      
      * DONUT preprocess
      
      * DPT preprocess
      
      * FLAVA preprocess
      
      * EFFICIENTNET preprocess
      
      * FUYU preprocess
      
      * GLPN preprocess
      
      * IMAGEGPT preprocess
      
      * INTRUCTBLIPVIDEO preprocess
      
      * VIVIT preprocess
      
      * ZOEDEPTH preprocess
      
      * VITMATTE preprocess
      
      * VIT preprocess
      
      * VILT preprocess
      
      * VIDEOMAE preprocess
      
      * VIDEOLLAVA
      
      * TVP processing
      
      * TVP fixup
      
      * SWIN2SR preprocess
      
      * SIGLIP preprocess
      
      * SAM preprocess
      
      * RT-DETR preprocess
      
      * PVT preprocess
      
      * POOLFORMER preprocess
      
      * PERCEIVER preprocess
      
      * OWLVIT preprocess
      
      * OWLV2 preprocess
      
      * NOUGAT preprocess
      
      * MOBILEVIT preprocess
      
      * MOBILENETV2 preprocess
      
      * MOBILENETV1 preprocess
      
      * LEVIT preprocess
      
      * LAYOUTLMV2 preprocess
      
      * LAYOUTLMV3 preprocess
      
      * Add test
      
      * Update tests
      fb66ef81
    • Fanli Lin's avatar
      add the missing flash attention test marker (#32419) · e85d8639
      Fanli Lin authored
      * add flash attention check
      
      * fix
      
      * fix
      
      * add the missing marker
      
      * bug fix
      
      * add one more
      
      * remove order
      
      * add one more
      e85d8639
    • Prakarsh Kaushik's avatar
      Llava: fix checkpoint_doc (#32458) · 0aa83282
      Prakarsh Kaushik authored
      fix: add new llava like model bug
      0aa83282
    • Raushan Turganbay's avatar
      Cache: create docs (#32150) · 37c5ca5e
      Raushan Turganbay authored
      
      
      * draft
      
      * updates
      
      * works?
      
      * try adding python example in hidden section
      
      * another try
      
      * hwo do i render python
      
      * format as html code?
      
      * Update docs/source/en/kv_cache.md
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update docs/source/en/kv_cache.md
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update docs/source/en/kv_cache.md
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update docs/source/en/kv_cache.md
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update docs/source/en/kv_cache.md
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * one more small update
      
      * should render hidden secrtion now
      
      * add outputs
      
      * fix links
      
      * check links
      
      * update all links
      
      * update with offloaded cache
      
      * all cache is importable, so they appear in docs
      
      * fix copies
      
      * docstring...
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      37c5ca5e
  2. 05 Aug, 2024 10 commits
  3. 03 Aug, 2024 2 commits
    • Xueshen Liu's avatar
      MixtralFlashAttention2: put "plus 1" inside parentheses when calculating... · 621fb3c0
      Xueshen Liu authored
      MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500)
      
      * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)
      
      * fix typo [:-1] to [:, -1]
      
      * to meet formatting requirement
      
      * to meet formatting requirement
      
      * remove white space
      
      * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.
      
      * propagate to startcoder2, phi3, mixtral and qwen2
      
      * update qwen2_moe
      621fb3c0
    • Shaopeng Fu's avatar
      fix: (issue #32124) Exception raised when running... · 7c31d05b
      Shaopeng Fu authored
      fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157)
      
      fix: Exception raised when running .
      7c31d05b
  4. 02 Aug, 2024 3 commits
  5. 01 Aug, 2024 13 commits
  6. 31 Jul, 2024 5 commits