1. 17 Oct, 2025 1 commit
    • Daniel Hiltgen's avatar
      test: harden scheduler tests (#12662) · 68e04c7f
      Daniel Hiltgen authored
      * test: harden scheduler tests
      
      This removes reschedDelay which was stale code, and adds
      a new configurable timeout for the waitForVRAMRecovery so
      tests can now set the timeout to be very short to avoid the
      scheduler getting stuck and hitting a test timeout.
      
      * test: tune tests for partial loads
      
      Give stress tests more time when the model is split between CPU/GPU
      68e04c7f
  2. 08 Oct, 2025 1 commit
  3. 22 Sep, 2025 1 commit
  4. 05 Jul, 2025 1 commit
    • Daniel Hiltgen's avatar
      int: add performance integration tests (#11173) · 4f473e22
      Daniel Hiltgen authored
      usage example:
        go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
        cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
        cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
      4f473e22
  5. 19 Jun, 2025 1 commit
  6. 06 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Move quantization to new backend (#10363) · 42481045
      Daniel Hiltgen authored
      * Move quantization logic to GGML via new backend
      
      This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
      
      * Remove "add model quantizations"
      
      This is no longer needed now that quantization is implemented in Go+GGML code directly.
      42481045
  7. 16 Apr, 2025 1 commit