".github/vscode:/vscode.git/clone" did not exist on "d07f73003d4d077854869b8f73275657f280334c"
- 24 Sep, 2024 8 commits
-
-
Orhun Parmaksız authored
-
Orhun Parmaksız authored
-
Daniël de Kok authored
This replaces the custom layers in both models.
-
Daniël de Kok authored
* Add support for scalar FP8 weight scales * Support LLM compressor FP8 checkpoints on H100 On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype. However, we wouldn't pick up fp8 quantization for models quantized with LLM compressor. This change adds enough parsing to detect if models have FP8-quantized weights. * Remove stray debug print
-
Nicolas Patry authored
-
Nicolas Patry authored
-
Alvaro Bartolome authored
-
OlivierDehaene authored
* wip * added v2
-
- 23 Sep, 2024 1 commit
-
-
Daniël de Kok authored
-
- 20 Sep, 2024 3 commits
-
-
Nicolas Patry authored
* Preparing for release. * Upgrade version in docs.
-
OlivierDehaene authored
* fix: wrap python basic logs in debug assertion in launcher * use level filters instead
-
Wang, Yi authored
Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
- 19 Sep, 2024 3 commits
-
-
Daniël de Kok authored
-
Daniël de Kok authored
* Update to moe-kenels 0.3.1 * Attempt to fix apt failure
-
Nicolas Patry authored
* Stream options. * Fetch stuff from nix integration test for easier testing. * Adding the assert. * Only send the usage when asked for. * Update the docs. * Impure test because we need network. * develop. * Optional usage. * Fixes. * Workflow
-
- 17 Sep, 2024 3 commits
-
-
Daniël de Kok authored
* Move to moe-kernels package and switch to common MoE layer This change introduces the new `moe-kernels` package: - Add `moe-kernels` as a dependency. - Introduce a `SparseMoELayer` module that can be used by MoE models. - Port over Mixtral and Deepseek. * Make `cargo check` pass * Update runner
-
OlivierDehaene authored
-
Daniël de Kok authored
Runs the tests in a Nix build sandbox.
-
- 16 Sep, 2024 2 commits
-
-
Nicolas Patry authored
* Adding a test for FD. * Fixing flashdecoding (empty batch doesn't work). * Fixing the invalid popping. * Fixing radix with block_size > 1 * Last reference. * Use an actual hash. * Update hash for slice.len() == 1 * Update the locks. * Increasing docker timeout.
-
Daniël de Kok authored
Disable by default because CI runners do not have enough GPUs.
-
- 13 Sep, 2024 1 commit
-
-
Alex Strick van Linschoten authored
* use ratatui not archived tui * bump ratatui all the way with options
-
- 12 Sep, 2024 4 commits
-
-
Wang, Yi authored
enable intel ipex cpu and xpu in python3.11 Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
drbh authored
fix: pass missing revision arg for lora adapter when loading multiple adapters
-
Nicolas Patry authored
* Add nix test. * Modifying yourself means you need to rerun. * Fixing the test + adding click (needed for pre-commit hooks). * Try thuis. * Our runner + pure test (not written) * Reemove server. * Root user. * Different user ? * Add the actual test target. * Forgot this modification. * Add a formatter. * Add the secrets. * Fixed the auth token ? * Adding the other tests. * Missing pre-commit. * Test requires cargo for cargo fmt. * Update it a bit. * Up. * Attempting to use a cache location for the models. * Ignore the cache for now.
-
Daniël de Kok authored
Ideally we wouldn't have the router wrapper that this change adds, but when I give PyO3 a Python interpreter with packages, it ends up linking libpython from the Python interpreter rather than the constructed environment and cannot pick up the Python modules as a result.
-
- 11 Sep, 2024 3 commits
-
-
Nicolas Patry authored
* Attempting to discard the trufflehog warning. * Attempt to fix trufflehog.
-
Nicolas Patry authored
* Fixing odd tokenization self modifications on the Rust side (load and resave in Python). * Fixing the builds ? * Fix the gh action? * Fixing the location ? * Validation is odd. * Try a faster runner * Upgrade python version. * Remove sccache * No sccache. * Getting libpython maybe ? * List stuff. * Monkey it up. * have no idea at this point * Tmp. * Shot in the dark. * Tmate the hell out of this. * Desperation. * WTF. * -y. * Apparently 3.10 is not available anymore. * Updating the dockerfile to make libpython discoverable at runtime too. * Put back rust tests. * Why do we want mkl on AMD ? * Forcing 3.11 ?
-
Nicolas Patry authored
* Adding prefix test. * [WIP] tmp dump of integration load tests. * Remove other tensor creation. * Fixed the radix tree. Used a slice everywhere in radix.rs to keep the cheap Arc cloning instead of recomputing the input_ids. * Fix parsing * Is it really flashinfer version ? * Remove some comments. * Revert the max prefix hit. * Adding numpy to diff. * Upgraded flashinfer. * Upgrading some stuff. * Are we done yet ? * Minor fixup * Remove 1 log and put back the other. * Add comment for why slot 0 is OK. * Mounting on the job. * Get me a debug branch * Debugging CIs is fun. * Attempt #28 * wip * Tmate. * Praying. * Updating VLM causal model with updated context. * Important line got squashed. * Tmate again. * Fingers crossed. * We want only 1 run of integration tests..... --------- Co-authored-by:Guillaume LEGENDRE <glegendre01@gmail.com>
-
- 07 Sep, 2024 1 commit
-
-
Vallepu Vamsi Krishna authored
Update Makefile-fbgemm Added Directory check for FBGEMM repository cloning.
-
- 06 Sep, 2024 6 commits
-
-
Nicolas Patry authored
-
Martin Iglesias Goyanes authored
* Add links to Adyen blogpost * Adding to toctree. * Update external.md * Update _toctree.yml --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
Daniël de Kok authored
-
Daniël de Kok authored
These should all be cheap assertions. Also: * Fixup some comments. * Delete a `remove` that was done unnecessarily twice.
-
Daniël de Kok authored
-
Daniël de Kok authored
We need this to ensure that pyright/ruff are part of the same interpreter/venv.
-
- 05 Sep, 2024 4 commits
-
-
Wang, Yi authored
fix regression caused by attention api change. ipex.varlen_attention does not support paged-cache format kv input now. Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
Daniël de Kok authored
-
Nicolas Patry authored
-
Daniël de Kok authored
The minimum batch size logic could cause prefix blocks to be deallocated without prefill. The next allocation of the same prefix would then use garbage blocks.
-
- 02 Sep, 2024 1 commit
-
-
drbh authored
* feat: support lora revisions and qkv_proj weights * fix: add qkv_proj weights to weight test
-