Commits · e52be9bba2a11f6b2145fcd21be0ba860b977a4d · OpenDAS / text-generation-inference

19 Jul, 2024 1 commit

Add support for Deepseek V2 (#2224) · e52be9bb

Daniël de Kok authored Jul 19, 2024

Deepseek V2 is a MoE model from Deepseek. Relevant variations
compared to other models:

- Grouped top-K in expert selection.
- mscale in yarn is calculated using the `mscale` and `mscale_all_dim`
  configuration options.
- `mscale_all_dim` is also used in scaling attention softmax.
- Permuting of the query/key representations before applying rotary
  embeddings.
- Some projections cannot be sharded (`q_a_proj`, `kv_a_proj_with_mqa`).
  So, we need weight loads that supports quantized weights. To this
  end `{Weights,WeightLoader}.get_weight` was added.
- The query/key head dimensionality differs from that of the value,
  so we need to pad during attention.
- Heads with size 192, needs an extension to our paged attention
  fork and we need to ensure that the KV cache is allocated with the
  correct size.
- Shared experts.

e52be9bb

25 Jun, 2024 1 commit

Add pytest release marker (#2114) · fc9c3153

Daniël de Kok authored Jun 25, 2024

* Add pytest release marker

Annotate a test with `@pytest.mark.release` and it only gets run
with `pytest integration-tests --release`.

* Mark many models as `release` to speed up CI

fc9c3153

27 May, 2024 1 commit
- Fix (flash) Gemma prefix and enable tests · 9231098f
  Daniël de Kok authored May 24, 2024
  
  9231098f
21 Feb, 2024 1 commit
- feat: add support for Gemma (#1583) · c86f58d3
  OlivierDehaene authored Feb 21, 2024
  
  c86f58d3
02 Jun, 2023 1 commit
- feat(server): only compute prefill logprobs when asked (#406) · 895c5f15
  OlivierDehaene authored Jun 02, 2023
```
Close #288
```
  895c5f15
31 May, 2023 1 commit
- increase health checks · 444400b4
  OlivierDehaene authored May 31, 2023
  
  444400b4
26 May, 2023 1 commit
- feat(server): support vectorized warpers in flash causal lm (#317) · 62f91f78
  OlivierDehaene authored May 26, 2023
```
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
```
  62f91f78
16 May, 2023 1 commit
- feat(integration-tests): improve comparison and health checks (#336) · dbdc587d
  OlivierDehaene authored May 16, 2023
  
  dbdc587d
15 May, 2023 1 commit
- feat: add snapshot testing (#282) · e71471be
  OlivierDehaene authored May 15, 2023
  
  e71471be