1. 20 Nov, 2024 1 commit
    • Daniël de Kok's avatar
      nix: build and cache impure devshells (#2765) · 07bed530
      Daniël de Kok authored
      * nix: build and cache all devshells
      
      * nix: add poetry to the impure shell
      
      This shouldn't be used to manage dependencies in a Nix devshell, but can
      be handy to update `poetry.lock`.
      
      * Fix Nix build, disable pure shell (covered by Nix tests)
      07bed530
  2. 23 Oct, 2024 1 commit
  3. 22 Oct, 2024 1 commit
    • Daniël de Kok's avatar
      Add `impureWithCuda` dev shell (#2677) · 9c9ef37c
      Daniël de Kok authored
      * Add `impureWithCuda` dev shell
      
      This shell is handy when developing some kernels jointly with TGI - it
      adds nvcc and a bunch of commonly-used CUDA libraries to the environment.
      
      We don't add this to the normal impure shell to keep the development
      environment as clean as possible (avoid accidental dependencies, etc.).
      
      * Add cuDNN
      9c9ef37c
  4. 09 Oct, 2024 1 commit
    • Daniël de Kok's avatar
      nix: add black and isort to the closure (#2619) · 9ed0c85f
      Daniël de Kok authored
      To make sure that everything is formatted with the same black version
      as CI.
      
      I sometimes use isort for new files to get nicely ordered imports,
      so add it as well. Also set the isort configuration to format in a
      way that is compatible with black.
      9ed0c85f
  5. 27 Sep, 2024 1 commit
    • Daniël de Kok's avatar
      Improve support for GPUs with capability < 8 (#2575) · 5b6b74e2
      Daniël de Kok authored
      * Improve support for GPUs with capability < 8
      
      - For models that cannot use flashinfer, use flash-attn v1 + paged
        attention for models with a compute capability older than 8.
      - Disable prefix caching when using paged attention.
      - When using flash-attn v1, pass the key/value, rather than the
        cache, since v1 cannot use block tables.
      
      * nix: add flash-attn-v1 to the server environment
      
      * Move disabling prefix caching into the block of exceptions
      
      * Capability as `usize`s
      5b6b74e2