Commits · 0ff6ff60ada291840beed63d8bf458d6f9606f7f · OpenDAS / text-generation-inference

24 Sep, 2024 4 commits
- Hotfixing main (#2556) · 0ff6ff60
  Nicolas Patry authored Sep 24, 2024
  
  0ff6ff60
- Micro cleanup. (#2555) · 74d3ce10
  Nicolas Patry authored Sep 24, 2024
  
  74d3ce10
- Remove duplicated `RUN` in `Dockerfile` (#2547) · d31a6f75
  Alvaro Bartolome authored Sep 24, 2024
  
  d31a6f75
- chore: Add old V2 backend (#2551) · 10e6f292
  OlivierDehaene authored Sep 24, 2024
```
* wip

* added v2
```
  10e6f292
23 Sep, 2024 1 commit
- nix: remove unused `_server.nix` file (#2538) · 9263817c
  Daniël de Kok authored Sep 23, 2024
  
  9263817c
20 Sep, 2024 3 commits
- Preparing for release. (#2540) · 169178b9
  Nicolas Patry authored Sep 20, 2024
```
* Preparing for release.

* Upgrade version in docs.
```
  169178b9
- fix: wrap python basic logs in debug assertion in launcher (#2539) · 7e2d1887
  OlivierDehaene authored Sep 20, 2024
```
* fix: wrap python basic logs in debug assertion in launcher

* use level filters instead
```
  7e2d1887
- hotfix: ipex fails since cuda moe kernel is not supported (#2532) · f478aa77
  Wang, Yi authored Sep 20, 2024
```
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
  f478aa77
19 Sep, 2024 3 commits

doc: clarify that `--quantize` is not needed for pre-quantized models (#2536) · abd24dd3
Daniël de Kok authored Sep 19, 2024

abd24dd3
Update to moe-kenels 0.3.1 (#2535) · c1037601
Daniël de Kok authored Sep 19, 2024
```
* Update to moe-kenels 0.3.1

* Attempt to fix apt failure
```
c1037601

Stream options. (#2533) · f512021e

Nicolas Patry authored Sep 19, 2024

* Stream options.

* Fetch stuff from nix integration test for easier testing.

* Adding the assert.

* Only send the usage when asked for.

* Update the docs.

* Impure test because we need network.

* develop.

* Optional usage.

* Fixes.

* Workflow

f512021e

17 Sep, 2024 3 commits

Move to moe-kernels package and switch to common MoE layer (#2511) · ce85efa9

Daniël de Kok authored Sep 17, 2024

* Move to moe-kernels package and switch to common MoE layer

This change introduces the new `moe-kernels` package:

- Add `moe-kernels` as a dependency.
- Introduce a `SparseMoELayer` module that can be used by MoE
  models.
- Port over Mixtral and Deepseek.

* Make `cargo check` pass

* Update runner

ce85efa9

fix: metrics unbounded memory (#2528) · 86984e32
OlivierDehaene authored Sep 17, 2024

86984e32
nix: pure Rust check/fmt/clippy/test (#2525) · 71e42686
Daniël de Kok authored Sep 17, 2024
```
Runs the tests in a Nix build sandbox.
```
71e42686

16 Sep, 2024 2 commits

Adding a test for FD. (#2516) · 38fcafcf

Nicolas Patry authored Sep 16, 2024

* Adding a test for FD.

* Fixing flashdecoding (empty batch doesn't work).

* Fixing the invalid popping.

* Fixing radix with block_size > 1

* Last reference.

* Use an actual hash.

* Update hash for slice.len() == 1

* Update the locks.

* Increasing docker timeout.

38fcafcf

Add tests for Mixtral (#2520) · 77746552
Daniël de Kok authored Sep 16, 2024
```
Disable by default because CI runners do not have enough GPUs.
```
77746552

13 Sep, 2024 1 commit
- Use `ratatui` not (deprecated) `tui` (#2521) · 9cca3e0b
  Alex Strick van Linschoten authored Sep 13, 2024
```
* use ratatui not archived tui

* bump ratatui all the way with options
```
  9cca3e0b
12 Sep, 2024 4 commits

hotfix : enable intel ipex cpu and xpu in python3.11 (#2517) · 3ac7df2b
Wang, Yi authored Sep 12, 2024
```
enable intel ipex cpu and xpu in python3.11
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
3ac7df2b
fix: pass missing revision arg for lora adapter when loading multiple… (#2510) · 628334d3
drbh authored Sep 12, 2024
```
fix: pass missing revision arg for lora adapter when loading multiple adapters
```
628334d3

Add nix test. (#2513) · d95c670a

Nicolas Patry authored Sep 12, 2024

* Add nix test.

* Modifying yourself means you need to rerun.

* Fixing the test + adding click (needed for pre-commit hooks).

* Try thuis.

* Our runner + pure test (not written)

* Reemove server.

* Root user.

* Different user ?

* Add the actual test target.

* Forgot this modification.

* Add a formatter.

* Add the secrets.

* Fixed the auth token ?

* Adding the other tests.

* Missing pre-commit.

* Test requires cargo for cargo fmt.

* Update it a bit.

* Up.

* Attempting to use a cache location for the models.

* Ignore the cache for now.

d95c670a

nix: support Python tokenizer conversion in the router (#2515) · 94304649

Daniël de Kok authored Sep 12, 2024

Ideally we wouldn't have the router wrapper that this change adds,
but when I give PyO3 a Python interpreter with packages, it ends
up linking libpython from the Python interpreter rather than the
constructed environment and cannot pick up the Python modules as
a result.

94304649

11 Sep, 2024 3 commits

Fix truffle (#2514) · 69e3be20

Nicolas Patry authored Sep 11, 2024

* Attempting to discard the trufflehog warning.

* Attempt to fix trufflehog.

69e3be20

Fix tokenization yi (#2507) · dae3bf1d

Nicolas Patry authored Sep 11, 2024

* Fixing odd tokenization self modifications on the Rust side (load and
resave in Python).

* Fixing the builds ?

* Fix the gh action?

* Fixing the location ?

* Validation is odd.

* Try a faster runner

* Upgrade python version.

* Remove sccache

* No sccache.

* Getting libpython maybe ?

* List stuff.

* Monkey it up.

* have no idea at this point

* Tmp.

* Shot in the dark.

* Tmate the hell out of this.

* Desperation.

* WTF.

* -y.

* Apparently 3.10 is not available anymore.

* Updating the dockerfile to make libpython discoverable at runtime too.

* Put back rust tests.

* Why do we want mkl on AMD ?

* Forcing 3.11 ?

dae3bf1d

Prefix test - Different kind of load test to trigger prefix test bugs. (#2490) · a4e3e8c6

Nicolas Patry authored Sep 11, 2024



* Adding prefix test.

* [WIP] tmp dump of integration load tests.

* Remove other tensor creation.

* Fixed the radix tree.

Used a slice everywhere in radix.rs to keep the cheap Arc cloning
instead of recomputing the input_ids.

* Fix parsing

* Is it really flashinfer version ?

* Remove some comments.

* Revert the max prefix hit.

* Adding numpy to diff.

* Upgraded flashinfer.

* Upgrading some stuff.

* Are we done yet ?

* Minor fixup

* Remove 1 log and put back the other.

* Add comment for why slot 0 is OK.

* Mounting on the job.

* Get me a debug branch

* Debugging CIs is fun.

* Attempt #28

* wip

* Tmate.

* Praying.

* Updating VLM causal model with updated context.

* Important line got squashed.

* Tmate again.

* Fingers crossed.

* We want only 1 run of integration tests.....

---------
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>

a4e3e8c6

07 Sep, 2024 1 commit
- Add Directory Check to Prevent Redundant Cloning in Build Process (#2486) · eabbbbda
  Vallepu Vamsi Krishna authored Sep 07, 2024
```
Update Makefile-fbgemm

Added Directory check for FBGEMM repository cloning.
```
  eabbbbda
06 Sep, 2024 6 commits
- Fixing more correctly the invalid drop of the batch. (#2498) · c1fe28d6
  Nicolas Patry authored Sep 06, 2024
  
  c1fe28d6
- Add links to Adyen blogpost (#2500) · aaea212d
  Martin Iglesias Goyanes authored Sep 06, 2024
```
* Add links to Adyen blogpost

* Adding to toctree.

* Update external.md

* Update _toctree.yml

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
```
  aaea212d
- hotfix: add syrupy to the right subproject (#2499) · a3c9c62d
  Daniël de Kok authored Sep 06, 2024
  
  a3c9c62d
- radix trie: add assertions (#2491) · 379472c4
  Daniël de Kok authored Sep 06, 2024
```
These should all be cheap assertions.

Also:

* Fixup some comments.
* Delete a `remove` that was done unnecessarily twice.
```
  379472c4
- Fix incompatibility with latest `syrupy` and update in Poetry (#2497) · 2eb57a15
  Daniël de Kok authored Sep 06, 2024
  
  2eb57a15
- nix: add pyright/ruff for proper LSP in the impure devshell (#2496) · 0424e27f
  Daniël de Kok authored Sep 06, 2024
```
We need this to ensure that pyright/ruff are part of the same
interpreter/venv.
```
  0424e27f
05 Sep, 2024 4 commits
- hotfix: fix regression of attention api change in intel platform (#2439) · 5cd8025f
  Wang, Yi authored Sep 05, 2024
```
fix regression caused by attention api change. ipex.varlen_attention does not support paged-cache
format kv input now.
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
  5cd8025f
- Add two handy gitignores for Nix environments (#2484) · e279b38a
  Daniël de Kok authored Sep 05, 2024
  
  e279b38a
- Adding links to Adyen blogpost. (#2492) · 8b96a182
  Nicolas Patry authored Sep 05, 2024
  
  8b96a182
- hotfix: avoid non-prefilled block use when using prefix caching (#2489) · deec30f8
  Daniël de Kok authored Sep 05, 2024
```
The minimum batch size logic could cause prefix blocks to be
deallocated without prefill. The next allocation of the same
prefix would then use garbage blocks.
```
  deec30f8
02 Sep, 2024 4 commits
- feat: support lora revisions and qkv_proj weights (#2482) · 6cb42f49
  drbh authored Sep 02, 2024
```
* feat: support lora revisions and qkv_proj weights

* fix: add qkv_proj weights to weight test
```
  6cb42f49
- fix: enable chat requests in vertex endpoint (#2481) · 47d7e344
  drbh authored Sep 02, 2024
```
* fix: enable chat requests in vertex endpoint

* feat: avoid unwrap and pre allocate future vec
```
  47d7e344
- nix: add punica-kernels (#2477) · de2cdeca
  Daniël de Kok authored Sep 02, 2024
```
Enables LoRA support.
```
  de2cdeca
- nix: improve impure devshell (#2478) · e4ab8554
  Daniël de Kok authored Sep 02, 2024
```
- Add some test dependencies.
- Install server in venv.
- Install Python client in venv.
```
  e4ab8554
29 Aug, 2024 1 commit

Tied embeddings in MLP speculator. (#2473) · d9fbbaaf

Nicolas Patry authored Aug 29, 2024

* Tied embeddings in MLP speculator.

* Fixing the scale_weight when users decide to not use the speculation as
much as defined in the config.

* Adding scaling support + optimize some ops.

d9fbbaaf