- 24 Sep, 2024 4 commits
-
-
Nicolas Patry authored
-
Nicolas Patry authored
-
Alvaro Bartolome authored
-
OlivierDehaene authored
* wip * added v2
-
- 23 Sep, 2024 1 commit
-
-
Daniël de Kok authored
-
- 20 Sep, 2024 3 commits
-
-
Nicolas Patry authored
* Preparing for release. * Upgrade version in docs.
-
OlivierDehaene authored
* fix: wrap python basic logs in debug assertion in launcher * use level filters instead
-
Wang, Yi authored
Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
- 19 Sep, 2024 3 commits
-
-
Daniël de Kok authored
-
Daniël de Kok authored
* Update to moe-kenels 0.3.1 * Attempt to fix apt failure
-
Nicolas Patry authored
* Stream options. * Fetch stuff from nix integration test for easier testing. * Adding the assert. * Only send the usage when asked for. * Update the docs. * Impure test because we need network. * develop. * Optional usage. * Fixes. * Workflow
-
- 17 Sep, 2024 3 commits
-
-
Daniël de Kok authored
* Move to moe-kernels package and switch to common MoE layer This change introduces the new `moe-kernels` package: - Add `moe-kernels` as a dependency. - Introduce a `SparseMoELayer` module that can be used by MoE models. - Port over Mixtral and Deepseek. * Make `cargo check` pass * Update runner
-
OlivierDehaene authored
-
Daniël de Kok authored
Runs the tests in a Nix build sandbox.
-
- 16 Sep, 2024 2 commits
-
-
Nicolas Patry authored
* Adding a test for FD. * Fixing flashdecoding (empty batch doesn't work). * Fixing the invalid popping. * Fixing radix with block_size > 1 * Last reference. * Use an actual hash. * Update hash for slice.len() == 1 * Update the locks. * Increasing docker timeout.
-
Daniël de Kok authored
Disable by default because CI runners do not have enough GPUs.
-
- 13 Sep, 2024 1 commit
-
-
Alex Strick van Linschoten authored
* use ratatui not archived tui * bump ratatui all the way with options
-
- 12 Sep, 2024 4 commits
-
-
Wang, Yi authored
enable intel ipex cpu and xpu in python3.11 Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
drbh authored
fix: pass missing revision arg for lora adapter when loading multiple adapters
-
Nicolas Patry authored
* Add nix test. * Modifying yourself means you need to rerun. * Fixing the test + adding click (needed for pre-commit hooks). * Try thuis. * Our runner + pure test (not written) * Reemove server. * Root user. * Different user ? * Add the actual test target. * Forgot this modification. * Add a formatter. * Add the secrets. * Fixed the auth token ? * Adding the other tests. * Missing pre-commit. * Test requires cargo for cargo fmt. * Update it a bit. * Up. * Attempting to use a cache location for the models. * Ignore the cache for now.
-
Daniël de Kok authored
Ideally we wouldn't have the router wrapper that this change adds, but when I give PyO3 a Python interpreter with packages, it ends up linking libpython from the Python interpreter rather than the constructed environment and cannot pick up the Python modules as a result.
-
- 11 Sep, 2024 3 commits
-
-
Nicolas Patry authored
* Attempting to discard the trufflehog warning. * Attempt to fix trufflehog.
-
Nicolas Patry authored
* Fixing odd tokenization self modifications on the Rust side (load and resave in Python). * Fixing the builds ? * Fix the gh action? * Fixing the location ? * Validation is odd. * Try a faster runner * Upgrade python version. * Remove sccache * No sccache. * Getting libpython maybe ? * List stuff. * Monkey it up. * have no idea at this point * Tmp. * Shot in the dark. * Tmate the hell out of this. * Desperation. * WTF. * -y. * Apparently 3.10 is not available anymore. * Updating the dockerfile to make libpython discoverable at runtime too. * Put back rust tests. * Why do we want mkl on AMD ? * Forcing 3.11 ?
-
Nicolas Patry authored
* Adding prefix test. * [WIP] tmp dump of integration load tests. * Remove other tensor creation. * Fixed the radix tree. Used a slice everywhere in radix.rs to keep the cheap Arc cloning instead of recomputing the input_ids. * Fix parsing * Is it really flashinfer version ? * Remove some comments. * Revert the max prefix hit. * Adding numpy to diff. * Upgraded flashinfer. * Upgrading some stuff. * Are we done yet ? * Minor fixup * Remove 1 log and put back the other. * Add comment for why slot 0 is OK. * Mounting on the job. * Get me a debug branch * Debugging CIs is fun. * Attempt #28 * wip * Tmate. * Praying. * Updating VLM causal model with updated context. * Important line got squashed. * Tmate again. * Fingers crossed. * We want only 1 run of integration tests..... --------- Co-authored-by:Guillaume LEGENDRE <glegendre01@gmail.com>
-
- 07 Sep, 2024 1 commit
-
-
Vallepu Vamsi Krishna authored
Update Makefile-fbgemm Added Directory check for FBGEMM repository cloning.
-
- 06 Sep, 2024 6 commits
-
-
Nicolas Patry authored
-
Martin Iglesias Goyanes authored
* Add links to Adyen blogpost * Adding to toctree. * Update external.md * Update _toctree.yml --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
Daniël de Kok authored
-
Daniël de Kok authored
These should all be cheap assertions. Also: * Fixup some comments. * Delete a `remove` that was done unnecessarily twice.
-
Daniël de Kok authored
-
Daniël de Kok authored
We need this to ensure that pyright/ruff are part of the same interpreter/venv.
-
- 05 Sep, 2024 4 commits
-
-
Wang, Yi authored
fix regression caused by attention api change. ipex.varlen_attention does not support paged-cache format kv input now. Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
Daniël de Kok authored
-
Nicolas Patry authored
-
Daniël de Kok authored
The minimum batch size logic could cause prefix blocks to be deallocated without prefill. The next allocation of the same prefix would then use garbage blocks.
-
- 02 Sep, 2024 4 commits
-
-
drbh authored
* feat: support lora revisions and qkv_proj weights * fix: add qkv_proj weights to weight test
-
drbh authored
* fix: enable chat requests in vertex endpoint * feat: avoid unwrap and pre allocate future vec
-
Daniël de Kok authored
Enables LoRA support.
-
Daniël de Kok authored
- Add some test dependencies. - Install server in venv. - Install Python client in venv.
-
- 29 Aug, 2024 1 commit
-
-
Nicolas Patry authored
* Tied embeddings in MLP speculator. * Fixing the scale_weight when users decide to not use the speculation as much as defined in the config. * Adding scaling support + optimize some ops.
-