Commits · 8024ded58f94a6ca0a1dc187ecb3e4a963ebb8fc · OpenDAS / text-generation-inference

".github/vscode:/vscode.git/clone" did not exist on "d07f73003d4d077854869b8f73275657f280334c"

24 Sep, 2024 8 commits
- Simplify crossterm imports (#2545) · 8024ded5
  Orhun Parmaksız authored Sep 24, 2024
  
  8024ded5
- Update the link to the Ratatui organization (#2546) · 03263f5e
  Orhun Parmaksız authored Sep 24, 2024
  
  03263f5e
- Add `DenseMoELayer` and wire it up in Mixtral/Deepseek V2 (#2537) · 3f14cd14
  Daniël de Kok authored Sep 24, 2024
```
This replaces the custom layers in both models.
```
  3f14cd14
- Add support for scalar FP8 weight scales (#2550) · c29dc89c
  Daniël de Kok authored Sep 24, 2024
```
* Add support for scalar FP8 weight scales

* Support LLM compressor FP8 checkpoints on H100

On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype.
However, we wouldn't pick up fp8 quantization for models quantized with
LLM compressor. This change adds enough parsing to detect if models have
FP8-quantized weights.

* Remove stray debug print
```
  c29dc89c
- Hotfixing main (#2556) · 0ff6ff60
  Nicolas Patry authored Sep 24, 2024
  
  0ff6ff60
- Micro cleanup. (#2555) · 74d3ce10
  Nicolas Patry authored Sep 24, 2024
  
  74d3ce10
- Remove duplicated `RUN` in `Dockerfile` (#2547) · d31a6f75
  Alvaro Bartolome authored Sep 24, 2024
  
  d31a6f75
- chore: Add old V2 backend (#2551) · 10e6f292
  OlivierDehaene authored Sep 24, 2024
```
* wip

* added v2
```
  10e6f292
23 Sep, 2024 1 commit
- nix: remove unused `_server.nix` file (#2538) · 9263817c
  Daniël de Kok authored Sep 23, 2024
  
  9263817c
20 Sep, 2024 3 commits
- Preparing for release. (#2540) · 169178b9
  Nicolas Patry authored Sep 20, 2024
```
* Preparing for release.

* Upgrade version in docs.
```
  169178b9
- fix: wrap python basic logs in debug assertion in launcher (#2539) · 7e2d1887
  OlivierDehaene authored Sep 20, 2024
```
* fix: wrap python basic logs in debug assertion in launcher

* use level filters instead
```
  7e2d1887
- hotfix: ipex fails since cuda moe kernel is not supported (#2532) · f478aa77
  Wang, Yi authored Sep 20, 2024
```
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
  f478aa77
19 Sep, 2024 3 commits

doc: clarify that `--quantize` is not needed for pre-quantized models (#2536) · abd24dd3
Daniël de Kok authored Sep 19, 2024

abd24dd3
Update to moe-kenels 0.3.1 (#2535) · c1037601
Daniël de Kok authored Sep 19, 2024
```
* Update to moe-kenels 0.3.1

* Attempt to fix apt failure
```
c1037601

Stream options. (#2533) · f512021e

Nicolas Patry authored Sep 19, 2024

* Stream options.

* Fetch stuff from nix integration test for easier testing.

* Adding the assert.

* Only send the usage when asked for.

* Update the docs.

* Impure test because we need network.

* develop.

* Optional usage.

* Fixes.

* Workflow

f512021e

17 Sep, 2024 3 commits

Move to moe-kernels package and switch to common MoE layer (#2511) · ce85efa9

Daniël de Kok authored Sep 17, 2024

* Move to moe-kernels package and switch to common MoE layer

This change introduces the new `moe-kernels` package:

- Add `moe-kernels` as a dependency.
- Introduce a `SparseMoELayer` module that can be used by MoE
  models.
- Port over Mixtral and Deepseek.

* Make `cargo check` pass

* Update runner

ce85efa9

fix: metrics unbounded memory (#2528) · 86984e32
OlivierDehaene authored Sep 17, 2024

86984e32
nix: pure Rust check/fmt/clippy/test (#2525) · 71e42686
Daniël de Kok authored Sep 17, 2024
```
Runs the tests in a Nix build sandbox.
```
71e42686

16 Sep, 2024 2 commits

Adding a test for FD. (#2516) · 38fcafcf

Nicolas Patry authored Sep 16, 2024

* Adding a test for FD.

* Fixing flashdecoding (empty batch doesn't work).

* Fixing the invalid popping.

* Fixing radix with block_size > 1

* Last reference.

* Use an actual hash.

* Update hash for slice.len() == 1

* Update the locks.

* Increasing docker timeout.

38fcafcf

Add tests for Mixtral (#2520) · 77746552
Daniël de Kok authored Sep 16, 2024
```
Disable by default because CI runners do not have enough GPUs.
```
77746552

13 Sep, 2024 1 commit
- Use `ratatui` not (deprecated) `tui` (#2521) · 9cca3e0b
  Alex Strick van Linschoten authored Sep 13, 2024
```
* use ratatui not archived tui

* bump ratatui all the way with options
```
  9cca3e0b
12 Sep, 2024 4 commits

hotfix : enable intel ipex cpu and xpu in python3.11 (#2517) · 3ac7df2b
Wang, Yi authored Sep 12, 2024
```
enable intel ipex cpu and xpu in python3.11
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
3ac7df2b
fix: pass missing revision arg for lora adapter when loading multiple… (#2510) · 628334d3
drbh authored Sep 12, 2024
```
fix: pass missing revision arg for lora adapter when loading multiple adapters
```
628334d3

Add nix test. (#2513) · d95c670a

Nicolas Patry authored Sep 12, 2024

* Add nix test.

* Modifying yourself means you need to rerun.

* Fixing the test + adding click (needed for pre-commit hooks).

* Try thuis.

* Our runner + pure test (not written)

* Reemove server.

* Root user.

* Different user ?

* Add the actual test target.

* Forgot this modification.

* Add a formatter.

* Add the secrets.

* Fixed the auth token ?

* Adding the other tests.

* Missing pre-commit.

* Test requires cargo for cargo fmt.

* Update it a bit.

* Up.

* Attempting to use a cache location for the models.

* Ignore the cache for now.

d95c670a

nix: support Python tokenizer conversion in the router (#2515) · 94304649

Daniël de Kok authored Sep 12, 2024

Ideally we wouldn't have the router wrapper that this change adds,
but when I give PyO3 a Python interpreter with packages, it ends
up linking libpython from the Python interpreter rather than the
constructed environment and cannot pick up the Python modules as
a result.

94304649

11 Sep, 2024 3 commits

Fix truffle (#2514) · 69e3be20

Nicolas Patry authored Sep 11, 2024

* Attempting to discard the trufflehog warning.

* Attempt to fix trufflehog.

69e3be20

Fix tokenization yi (#2507) · dae3bf1d

Nicolas Patry authored Sep 11, 2024

* Fixing odd tokenization self modifications on the Rust side (load and
resave in Python).

* Fixing the builds ?

* Fix the gh action?

* Fixing the location ?

* Validation is odd.

* Try a faster runner

* Upgrade python version.

* Remove sccache

* No sccache.

* Getting libpython maybe ?

* List stuff.

* Monkey it up.

* have no idea at this point

* Tmp.

* Shot in the dark.

* Tmate the hell out of this.

* Desperation.

* WTF.

* -y.

* Apparently 3.10 is not available anymore.

* Updating the dockerfile to make libpython discoverable at runtime too.

* Put back rust tests.

* Why do we want mkl on AMD ?

* Forcing 3.11 ?

dae3bf1d

Prefix test - Different kind of load test to trigger prefix test bugs. (#2490) · a4e3e8c6

Nicolas Patry authored Sep 11, 2024



* Adding prefix test.

* [WIP] tmp dump of integration load tests.

* Remove other tensor creation.

* Fixed the radix tree.

Used a slice everywhere in radix.rs to keep the cheap Arc cloning
instead of recomputing the input_ids.

* Fix parsing

* Is it really flashinfer version ?

* Remove some comments.

* Revert the max prefix hit.

* Adding numpy to diff.

* Upgraded flashinfer.

* Upgrading some stuff.

* Are we done yet ?

* Minor fixup

* Remove 1 log and put back the other.

* Add comment for why slot 0 is OK.

* Mounting on the job.

* Get me a debug branch

* Debugging CIs is fun.

* Attempt #28

* wip

* Tmate.

* Praying.

* Updating VLM causal model with updated context.

* Important line got squashed.

* Tmate again.

* Fingers crossed.

* We want only 1 run of integration tests.....

---------
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>

a4e3e8c6

07 Sep, 2024 1 commit
- Add Directory Check to Prevent Redundant Cloning in Build Process (#2486) · eabbbbda
  Vallepu Vamsi Krishna authored Sep 07, 2024
```
Update Makefile-fbgemm

Added Directory check for FBGEMM repository cloning.
```
  eabbbbda
06 Sep, 2024 6 commits
- Fixing more correctly the invalid drop of the batch. (#2498) · c1fe28d6
  Nicolas Patry authored Sep 06, 2024
  
  c1fe28d6
- Add links to Adyen blogpost (#2500) · aaea212d
  Martin Iglesias Goyanes authored Sep 06, 2024
```
* Add links to Adyen blogpost

* Adding to toctree.

* Update external.md

* Update _toctree.yml

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
```
  aaea212d
- hotfix: add syrupy to the right subproject (#2499) · a3c9c62d
  Daniël de Kok authored Sep 06, 2024
  
  a3c9c62d
- radix trie: add assertions (#2491) · 379472c4
  Daniël de Kok authored Sep 06, 2024
```
These should all be cheap assertions.

Also:

* Fixup some comments.
* Delete a `remove` that was done unnecessarily twice.
```
  379472c4
- Fix incompatibility with latest `syrupy` and update in Poetry (#2497) · 2eb57a15
  Daniël de Kok authored Sep 06, 2024
  
  2eb57a15
- nix: add pyright/ruff for proper LSP in the impure devshell (#2496) · 0424e27f
  Daniël de Kok authored Sep 06, 2024
```
We need this to ensure that pyright/ruff are part of the same
interpreter/venv.
```
  0424e27f
05 Sep, 2024 4 commits
- hotfix: fix regression of attention api change in intel platform (#2439) · 5cd8025f
  Wang, Yi authored Sep 05, 2024
```
fix regression caused by attention api change. ipex.varlen_attention does not support paged-cache
format kv input now.
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
  5cd8025f
- Add two handy gitignores for Nix environments (#2484) · e279b38a
  Daniël de Kok authored Sep 05, 2024
  
  e279b38a
- Adding links to Adyen blogpost. (#2492) · 8b96a182
  Nicolas Patry authored Sep 05, 2024
  
  8b96a182
- hotfix: avoid non-prefilled block use when using prefix caching (#2489) · deec30f8
  Daniël de Kok authored Sep 05, 2024
```
The minimum batch size logic could cause prefix blocks to be
deallocated without prefill. The next allocation of the same
prefix would then use garbage blocks.
```
  deec30f8
02 Sep, 2024 1 commit
- feat: support lora revisions and qkv_proj weights (#2482) · 6cb42f49
  drbh authored Sep 02, 2024
```
* feat: support lora revisions and qkv_proj weights

* fix: add qkv_proj weights to weight test
```
  6cb42f49