1. 02 Dec, 2024 3 commits
  2. 28 Nov, 2024 1 commit
    • drbh's avatar
      Support continue final message (#2733) · d4718051
      drbh authored
      * feat: support continue_final_message param in chat request
      
      * feat: add test for continue final message
      
      * fix: bump openapi docs
      
      * fix: remove continue_final_message chat request param
      
      * fix: remove unneeded launcher args in continue test
      
      * fix: bump test output
      
      * fix: remove accidentally included guideline from rebase
      
      * fix: remove guideline tests
      
      * fix: adjust continuation tests expected text
      
      * fix: replace expected output for continue test
      d4718051
  3. 26 Nov, 2024 3 commits
  4. 25 Nov, 2024 2 commits
  5. 22 Nov, 2024 2 commits
  6. 21 Nov, 2024 7 commits
  7. 20 Nov, 2024 5 commits
  8. 19 Nov, 2024 4 commits
  9. 18 Nov, 2024 4 commits
  10. 17 Nov, 2024 1 commit
    • Daniël de Kok's avatar
      Remove vLLM dependency for CUDA (#2751) · 52e48739
      Daniël de Kok authored
      * Remove vLLM dependency for CUDA
      
      This change adds `attention-kernels` as a dependency for paged
      attention and cache reshaping. With that, we don't use vLLM
      anywhere for CUDA.
      
      Tested run (since we don't have paged attention in CI):
      
      ```
      ❯ ATTENTION=paged python -m pytest integration-tests -k "llama and awq" --release
      [...]
      5 snapshots passed.
      ```
      
      * Fix clippy warning
      52e48739
  11. 15 Nov, 2024 7 commits
  12. 14 Nov, 2024 1 commit