"launcher/vscode:/vscode.git/clone" did not exist on "895a341d064c9930b2a9bd60cff0df42f91b52fa"
  1. 09 Jan, 2026 1 commit
    • Daniel Hiltgen's avatar
      Add experimental MLX backend and engine with imagegen support (#13648) · 33ee7168
      Daniel Hiltgen authored
      
      
      * WIP - MLX backend with gemma3
      
      * MLX: add cmake and go tag build toggles
      
      To build the new MLX backend code:
        cmake --preset MLX
        cmake --build --preset MLX --parallel
        cmake --install build --component MLX
        go build -tags mlx .
      
      Note: the main.go entrypoint for the MLX engine will change in a follow up commit.
      
      * add experimental image generation runtime
      
      * add experimental image generation runtime
      
      * MLX: wire up cuda build for linux
      
      * MLX: get dependencies correct and dedup
      
      This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
      directory.
      
      * fix relative link bug in dedup
      
      * Add darwin build and readme
      
      * add go build tag for mlx dependent code and wire up build_darwin.sh
      
      * lint cleanup
      
      * macos: build mlx for x86
      
      This will be CPU only.
      
      * cuda build instructions and fix drift from mlx bump
      
      * stale comment
      
      * Delete agent helper doc
      
      * Clean up readme.md
      
      * Revise README for tokenizer clarity and details
      
      Updated README to clarify tokenizer functionality and removed correctness section.
      
      ---------
      Co-authored-by: default avatarjmorganca <jmorganca@gmail.com>
      33ee7168
  2. 08 Jan, 2026 2 commits
  3. 07 Jan, 2026 3 commits
  4. 06 Jan, 2026 3 commits
  5. 03 Jan, 2026 5 commits
  6. 23 Dec, 2025 2 commits
  7. 19 Dec, 2025 1 commit
    • Jesse Gross's avatar
      llm: Avoid integer underflow on llama engine memory layout · 172b5924
      Jesse Gross authored
      On the llama engine, when we compute the memory layout, we reserve
      a buffer to allow for some flexibility for incorrect estimates.
      This is subtracted from GPU free memory and on GPUs with limited
      memory, it may underflow.
      
      Fixes #13494
      172b5924
  8. 18 Dec, 2025 4 commits
  9. 17 Dec, 2025 3 commits
  10. 16 Dec, 2025 8 commits
  11. 15 Dec, 2025 6 commits
  12. 13 Dec, 2025 2 commits