- 02 Oct, 2024 1 commit
-
-
Nicolas Patry authored
* Working loading state. * Preprocessing. * Working state ? (Broke idefics1 temporarily). * Cleaner condition. * Fix idefics. * Updating config, removing TODO * Mllama * Ugrade transformers 4.45 * Flashing mllama. * Starting to get there. * Working state. * Integrations tests for mllama (cutting to 10 tokens because there seems' to be instability after (meaning size of the batch matters. * Updating model link. * Earlier assert. * Fix vlm ? * remove log. * Force ignore all images but last. * Default dtype bfloat16. * Update integration test after switch to bf16. * Remove dead code. * Removed dead code. * Upgrade the flake to latest transformers/tokenizers * Move to hf tgi-nix * Upgrade to 0.5.0
-
- 01 Oct, 2024 1 commit
-
-
Daniël de Kok authored
* nix: experimental support for building a Docker image Run using something like: ``` docker run \ --device nvidia.com/gpu=all \ -it --rm -p 8080:80 \ -v $PWD/data:/data \ -v $PWD/tmp:/tmp \ tgi-docker:latest \ --model-id <model_id> ``` * Example of building the Docker image using Nix inside Docker * Stream to make the builder image smaller This avoids storing a Docker image tarball in the image. Instead, stream the layers while doing `docker run`. * Don't spam journalctl on Linux * Other dockerfile. --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 30 Sep, 2024 3 commits
-
-
Daniël de Kok authored
This change uses the updated Marlin MoE kernel from vLLM to support MoE with activation sorting and groups.
-
Daniël de Kok authored
-
Daniël de Kok authored
This change add support for MoE models that use GPTQ quantization. Currently only models with the following properties are supported: - No `desc_act` with tensor parallelism, unless `group_size=-1`. - No asymmetric quantization. - No AWQ.
-
- 27 Sep, 2024 1 commit
-
-
Daniël de Kok authored
* Improve support for GPUs with capability < 8 - For models that cannot use flashinfer, use flash-attn v1 + paged attention for models with a compute capability older than 8. - Disable prefix caching when using paged attention. - When using flash-attn v1, pass the key/value, rather than the cache, since v1 cannot use block tables. * nix: add flash-attn-v1 to the server environment * Move disabling prefix caching into the block of exceptions * Capability as `usize`s
-
- 19 Sep, 2024 2 commits
-
-
Daniël de Kok authored
-
Nicolas Patry authored
* Stream options. * Fetch stuff from nix integration test for easier testing. * Adding the assert. * Only send the usage when asked for. * Update the docs. * Impure test because we need network. * develop. * Optional usage. * Fixes. * Workflow
-
- 17 Sep, 2024 1 commit
-
-
Daniël de Kok authored
Runs the tests in a Nix build sandbox.
-
- 12 Sep, 2024 2 commits
-
-
Nicolas Patry authored
* Add nix test. * Modifying yourself means you need to rerun. * Fixing the test + adding click (needed for pre-commit hooks). * Try thuis. * Our runner + pure test (not written) * Reemove server. * Root user. * Different user ? * Add the actual test target. * Forgot this modification. * Add a formatter. * Add the secrets. * Fixed the auth token ? * Adding the other tests. * Missing pre-commit. * Test requires cargo for cargo fmt. * Update it a bit. * Up. * Attempting to use a cache location for the models. * Ignore the cache for now.
-
Daniël de Kok authored
Ideally we wouldn't have the router wrapper that this change adds, but when I give PyO3 a Python interpreter with packages, it ends up linking libpython from the Python interpreter rather than the constructed environment and cannot pick up the Python modules as a result.
-
- 06 Sep, 2024 1 commit
-
-
Daniël de Kok authored
We need this to ensure that pyright/ruff are part of the same interpreter/venv.
-
- 02 Sep, 2024 1 commit
-
-
Daniël de Kok authored
- Add some test dependencies. - Install server in venv. - Install Python client in venv.
-
- 29 Aug, 2024 1 commit
-
-
Daniël de Kok authored
Updates tgi-nix input: - Move Torch closer to upstream by building against MKL. - Remove compute capability 8.7 from Torch (Jetson). - Sync nixpkgs cumpute capabilities with Torch (avoids compiling too mana capabilities for MAGMA). - Use nixpkgs configuration passed through by `tgi-nix`.
-
- 23 Aug, 2024 1 commit
-
-
Daniël de Kok authored
The default package wraps the launcher and puts the server/router in the path. As a result, TGI can be started using something like: ``` nix run .# -- \ --model-id hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \ --port 8080 ```
-
- 21 Aug, 2024 1 commit
-
-
Daniël de Kok authored
nix: add text-generation-benchmark to pure devshell
-
- 20 Aug, 2024 2 commits
-
-
Daniël de Kok authored
* nix: pure server and support both pure and impure devShells * nix: remove unused poetry2nix input It is not wired up and we now have a pure server. * nix: add ipdb to impure devshell
-
Nicolas Patry authored
* Prefix caching WIP * Fixing prefix attention. * Fixing flashinfer import. * Fixing black. * Fixing medusa (still wrong outputs, but functional). * Just medusa values now. * Fixing medusa without prefix caching. * Fixing prefix caching. * Medusa requires reshaping. * Removing the logs. * Remove router.nix * Fixup: - Remove logs - Disable VLMs (they do not work) - Disable prefix caching when user wants prefill logprobs. * Update flake.lock --------- Co-authored-by:Daniël de Kok <me@danieldk.eu>
-
- 19 Aug, 2024 1 commit
-
-
Daniël de Kok authored
* Update to CUDA 12.4 * poetry2nix: follow tgi-nix nixpkgs
-
- 16 Aug, 2024 1 commit
-
-
Daniël de Kok authored
Try to reduce the number of router/launcher rebuilds by filtering sources. In this way, recompiles should only be triggered by changes in Cargo or Rust files.
-
- 15 Aug, 2024 1 commit
-
-
Daniël de Kok authored
-
- 14 Aug, 2024 2 commits
-
-
Nicolas Patry authored
* Upgrading exl2. * Fixing the other pathways. * Fix idefics.
-
Daniël de Kok authored
This is less incremental than crate2nix, but does build all dependencies separately, so avoids full rebuilds.
-
- 13 Aug, 2024 2 commits
-
-
Nicolas Patry authored
-
Daniël de Kok authored
-
- 12 Aug, 2024 3 commits
-
-
Nicolas Patry authored
-
Nicolas Patry authored
-
Daniël de Kok authored
-
- 09 Aug, 2024 3 commits
-
-
Daniël de Kok authored
-
Daniël de Kok authored
-
Daniël de Kok authored
Add flake.nix
-