1. 30 Jul, 2024 1 commit
  2. 29 Jul, 2024 6 commits
  3. 26 Jul, 2024 2 commits
    • drbh's avatar
      feat: add ruff and resolve issue (#2262) · bab02ff2
      drbh authored
      * feat: add ruff and resolve issue
      
      * fix: update client exports and adjust after rebase
      
      * fix: adjust syntax to avoid circular import
      
      * fix: adjust client ruff settings
      
      * fix: lint and refactor import check and avoid model enum as global names
      
      * fix: improve fbgemm_gpu check and lints
      
      * fix: update lints
      
      * fix: prefer comparing model enum over str
      
      * fix: adjust lints and ignore specific rules
      
      * fix: avoid unneeded quantize check
      bab02ff2
    • Daniël de Kok's avatar
  4. 25 Jul, 2024 4 commits
  5. 24 Jul, 2024 4 commits
  6. 23 Jul, 2024 9 commits
  7. 22 Jul, 2024 6 commits
  8. 21 Jul, 2024 1 commit
  9. 20 Jul, 2024 3 commits
  10. 19 Jul, 2024 4 commits
    • Daniël de Kok's avatar
      Add support for Deepseek V2 (#2224) · e52be9bb
      Daniël de Kok authored
      Deepseek V2 is a MoE model from Deepseek. Relevant variations
      compared to other models:
      
      - Grouped top-K in expert selection.
      - mscale in yarn is calculated using the `mscale` and `mscale_all_dim`
        configuration options.
      - `mscale_all_dim` is also used in scaling attention softmax.
      - Permuting of the query/key representations before applying rotary
        embeddings.
      - Some projections cannot be sharded (`q_a_proj`, `kv_a_proj_with_mqa`).
        So, we need weight loads that supports quantized weights. To this
        end `{Weights,WeightLoader}.get_weight` was added.
      - The query/key head dimensionality differs from that of the value,
        so we need to pad during attention.
      - Heads with size 192, needs an extension to our paged attention
        fork and we need to ensure that the KV cache is allocated with the
        correct size.
      - Shared experts.
      e52be9bb
    • drbh's avatar
      fix: adjust default tool choice (#2244) · 68a9685f
      drbh authored
      * fix: adjust default tool choice
      
      * feat: improve tool choice syntax and response parsing/errors
      
      * fix: remove dev tests
      
      * feat: add ToolChoice to docs
      68a9685f
    • Erik Kaunismäki's avatar
      add usage stats to toctree (#2260) · 40f5dc3e
      Erik Kaunismäki authored
      quick fix
      40f5dc3e
    • Erik Kaunismäki's avatar
      usage stats and crash reports (#2220) · 4c19593a
      Erik Kaunismäki authored
      
      
      * draft of usage stats
      
      * fix wrong link
      
      * launcher doesn't need sysinfo dep
      
      * only tokenizer class instead of hole struct
      
      * unused import
      
      * fix clippy errors
      
      * update openAPI doc
      
      * cargo fmt
      
      * fix error in passing flags to router
      
      * try again to update docs
      
      * run pre-commit locally
      
      * Update router/src/main.rs
      Co-authored-by: default avatarHugo Larcher <hugo.larcher@huggingface.co>
      
      * Update router/src/main.rs
      Co-authored-by: default avatarHugo Larcher <hugo.larcher@huggingface.co>
      
      * on crash use anonymous error event
      
      * delete json_output and ngrok
      
      * more robust way of checking if is in container
      
      * more robust nvidia smi
      
      * parse xpu more robustly
      
      * fix errors
      
      * add nvidia-smi details in docs
      
      * cargo fmt
      
      * fix clippy
      
      * should make docs check pass
      
      * Update router/src/usage_stats.rs
      Co-authored-by: default avatarHugo Larcher <hugo.larcher@huggingface.co>
      
      * error reason can't be in nested json
      
      * cargo fmt
      
      ---------
      Co-authored-by: default avatarHugo Larcher <hugo.larcher@huggingface.co>
      Co-authored-by: default avatarErik Kaunismäki <erikkaum@Eriks-MacBook-Pro.local>
      4c19593a