1. 23 Oct, 2024 1 commit
  2. 22 Oct, 2024 1 commit
    • Daniël de Kok's avatar
      Add `impureWithCuda` dev shell (#2677) · 9c9ef37c
      Daniël de Kok authored
      * Add `impureWithCuda` dev shell
      
      This shell is handy when developing some kernels jointly with TGI - it
      adds nvcc and a bunch of commonly-used CUDA libraries to the environment.
      
      We don't add this to the normal impure shell to keep the development
      environment as clean as possible (avoid accidental dependencies, etc.).
      
      * Add cuDNN
      9c9ef37c
  3. 09 Oct, 2024 1 commit
    • Daniël de Kok's avatar
      nix: add black and isort to the closure (#2619) · 9ed0c85f
      Daniël de Kok authored
      To make sure that everything is formatted with the same black version
      as CI.
      
      I sometimes use isort for new files to get nicely ordered imports,
      so add it as well. Also set the isort configuration to format in a
      way that is compatible with black.
      9ed0c85f
  4. 27 Sep, 2024 1 commit
    • Daniël de Kok's avatar
      Improve support for GPUs with capability < 8 (#2575) · 5b6b74e2
      Daniël de Kok authored
      * Improve support for GPUs with capability < 8
      
      - For models that cannot use flashinfer, use flash-attn v1 + paged
        attention for models with a compute capability older than 8.
      - Disable prefix caching when using paged attention.
      - When using flash-attn v1, pass the key/value, rather than the
        cache, since v1 cannot use block tables.
      
      * nix: add flash-attn-v1 to the server environment
      
      * Move disabling prefix caching into the block of exceptions
      
      * Capability as `usize`s
      5b6b74e2