1. 02 Oct, 2024 1 commit
    • Nicolas Patry's avatar
      Mllama flash version (#2585) · d18ed5cf
      Nicolas Patry authored
      * Working loading state.
      
      * Preprocessing.
      
      * Working state ? (Broke idefics1 temporarily).
      
      * Cleaner condition.
      
      * Fix idefics.
      
      * Updating config, removing TODO
      
      * Mllama
      
      * Ugrade transformers 4.45
      
      * Flashing mllama.
      
      * Starting to get there.
      
      * Working state.
      
      * Integrations tests for mllama (cutting to 10 tokens because there seems'
      to be instability after (meaning size of the batch matters.
      
      * Updating model link.
      
      * Earlier assert.
      
      * Fix vlm ?
      
      * remove log.
      
      * Force ignore all images but last.
      
      * Default dtype bfloat16.
      
      * Update integration test after switch to bf16.
      
      * Remove dead code.
      
      * Removed dead code.
      
      * Upgrade the flake to latest transformers/tokenizers
      
      * Move to hf tgi-nix
      
      * Upgrade to 0.5.0
      d18ed5cf
  2. 01 Oct, 2024 1 commit
    • Daniël de Kok's avatar
      nix: experimental support for building a Docker container (#2470) · 584b4d7a
      Daniël de Kok authored
      
      
      * nix: experimental support for building a Docker image
      
      Run using something like:
      
      ```
      docker run \
        --device nvidia.com/gpu=all \
        -it --rm -p 8080:80 \
        -v $PWD/data:/data \
        -v $PWD/tmp:/tmp \
        tgi-docker:latest \
        --model-id <model_id>
      ```
      
      * Example of building the Docker image using Nix inside Docker
      
      * Stream to make the builder image smaller
      
      This avoids storing a Docker image tarball in the image. Instead,
      stream the layers while doing `docker run`.
      
      * Don't spam journalctl on Linux
      
      * Other dockerfile.
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      584b4d7a
  3. 30 Sep, 2024 3 commits
  4. 27 Sep, 2024 1 commit
    • Daniël de Kok's avatar
      Improve support for GPUs with capability < 8 (#2575) · 5b6b74e2
      Daniël de Kok authored
      * Improve support for GPUs with capability < 8
      
      - For models that cannot use flashinfer, use flash-attn v1 + paged
        attention for models with a compute capability older than 8.
      - Disable prefix caching when using paged attention.
      - When using flash-attn v1, pass the key/value, rather than the
        cache, since v1 cannot use block tables.
      
      * nix: add flash-attn-v1 to the server environment
      
      * Move disabling prefix caching into the block of exceptions
      
      * Capability as `usize`s
      5b6b74e2
  5. 19 Sep, 2024 2 commits
  6. 17 Sep, 2024 1 commit
  7. 12 Sep, 2024 2 commits
    • Nicolas Patry's avatar
      Add nix test. (#2513) · d95c670a
      Nicolas Patry authored
      * Add nix test.
      
      * Modifying yourself means you need to rerun.
      
      * Fixing the test + adding click (needed for pre-commit hooks).
      
      * Try thuis.
      
      * Our runner + pure test (not written)
      
      * Reemove server.
      
      * Root user.
      
      * Different user ?
      
      * Add the actual test target.
      
      * Forgot this modification.
      
      * Add a formatter.
      
      * Add the secrets.
      
      * Fixed the auth token ?
      
      * Adding the other tests.
      
      * Missing pre-commit.
      
      * Test requires cargo for cargo fmt.
      
      * Update it a bit.
      
      * Up.
      
      * Attempting to use a cache location for the models.
      
      * Ignore the cache for now.
      d95c670a
    • Daniël de Kok's avatar
      nix: support Python tokenizer conversion in the router (#2515) · 94304649
      Daniël de Kok authored
      Ideally we wouldn't have the router wrapper that this change adds,
      but when I give PyO3 a Python interpreter with packages, it ends
      up linking libpython from the Python interpreter rather than the
      constructed environment and cannot pick up the Python modules as
      a result.
      94304649
  8. 06 Sep, 2024 1 commit
  9. 02 Sep, 2024 1 commit
  10. 29 Aug, 2024 1 commit
    • Daniël de Kok's avatar
      nix: build Torch against MKL and various other improvements (#2469) · 4e821c00
      Daniël de Kok authored
      Updates tgi-nix input:
      
      - Move Torch closer to upstream by building against MKL.
      - Remove compute capability 8.7 from Torch (Jetson).
      - Sync nixpkgs cumpute capabilities with Torch (avoids
        compiling too mana capabilities for MAGMA).
      - Use nixpkgs configuration passed through by `tgi-nix`.
      4e821c00
  11. 23 Aug, 2024 1 commit
    • Daniël de Kok's avatar
      nix: add default package (#2453) · f3c5d7d9
      Daniël de Kok authored
      The default package wraps the launcher and puts the server/router in the
      path.
      
      As a result, TGI can be started using something like:
      
      ```
      nix run .# -- \
        --model-id hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \
        --port 8080
      ```
      f3c5d7d9
  12. 21 Aug, 2024 1 commit
  13. 20 Aug, 2024 2 commits
    • Daniël de Kok's avatar
      nix: add pure server to flake, add both pure and impure devshells (#2430) · f5f11b79
      Daniël de Kok authored
      * nix: pure server and support both pure and impure devShells
      
      * nix: remove unused poetry2nix input
      
      It is not wired up and we now have a pure server.
      
      * nix: add ipdb to impure devshell
      f5f11b79
    • Nicolas Patry's avatar
      Prefix caching (#2402) · b70ae096
      Nicolas Patry authored
      
      
      * Prefix caching WIP
      
      * Fixing prefix attention.
      
      * Fixing flashinfer import.
      
      * Fixing black.
      
      * Fixing medusa (still wrong outputs, but functional).
      
      * Just medusa values now.
      
      * Fixing medusa without prefix caching.
      
      * Fixing prefix caching.
      
      * Medusa requires reshaping.
      
      * Removing the logs.
      
      * Remove router.nix
      
      * Fixup:
      
      - Remove logs
      - Disable VLMs (they do not work)
      - Disable prefix caching when user wants prefill logprobs.
      
      * Update flake.lock
      
      ---------
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      b70ae096
  14. 19 Aug, 2024 1 commit
  15. 16 Aug, 2024 1 commit
  16. 15 Aug, 2024 1 commit
  17. 14 Aug, 2024 2 commits
  18. 13 Aug, 2024 2 commits
  19. 12 Aug, 2024 3 commits
  20. 09 Aug, 2024 3 commits