1. 20 Oct, 2025 1 commit
  2. 26 Aug, 2025 1 commit
    • Michael Yang's avatar
      convert: fix tensor sorting (#12015) · 86834a27
      Michael Yang authored
      there's two bugs here.
      
      1. the check for a layer id is incorrect and should be >= 0 since layer
         0 is valid
      2. if both tensors have an layer identifier, it will only compare the
         layer id which will return 0 if the tensors are in the same layer.
         instead it should fallback to comparing the full tensor name
      86834a27
  3. 26 Jun, 2025 1 commit
  4. 16 Jun, 2025 1 commit
  5. 07 May, 2025 1 commit
  6. 06 May, 2025 1 commit
    • Daniel Hiltgen's avatar
      Move quantization to new backend (#10363) · 42481045
      Daniel Hiltgen authored
      * Move quantization logic to GGML via new backend
      
      This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
      
      * Remove "add model quantizations"
      
      This is no longer needed now that quantization is implemented in Go+GGML code directly.
      42481045
  7. 01 May, 2025 1 commit
  8. 25 Apr, 2025 2 commits
  9. 16 Apr, 2025 1 commit
  10. 14 Feb, 2025 1 commit
    • Michael Yang's avatar
      next ollama runner (#7913) · 58245413
      Michael Yang authored
      
      
      feat: add new Ollama engine using ggml through cgo
      
      This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
      
      - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
      - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
      - `ml.Tensor` defines the interface for a tensor and tensor operations
      
      This is the first implementation of the new engine. Follow up PRs will implement more features:
      
      - non-greedy sampling (#8410)
      - integration with Ollama and KV caching (#8301)
      - more model support (#9080) with more coming soon
      Co-authored-by: default avatarBruce MacDonald <brucewmacdonald@gmail.com>
      58245413
  11. 18 Oct, 2024 1 commit
  12. 12 Aug, 2024 1 commit
  13. 31 Jul, 2024 3 commits
  14. 16 Jul, 2024 1 commit
  15. 25 Jun, 2024 1 commit
    • Blake Mizerany's avatar
      llm: speed up gguf decoding by a lot (#5246) · cb42e607
      Blake Mizerany authored
      Previously, some costly things were causing the loading of GGUF files
      and their metadata and tensor information to be VERY slow:
      
        * Too many allocations when decoding strings
        * Hitting disk for each read of each key and value, resulting in a
          not-okay amount of syscalls/disk I/O.
      
      The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
      m3.
      
      This commit also prevents collecting large arrays of values when
      decoding GGUFs (if desired). When such keys are encountered, their
      values are null, and are encoded as such in JSON.
      
      Also, this fixes a broken test that was not encoding valid GGUF.
      cb42e607
  16. 11 Jun, 2024 1 commit
  17. 08 Jun, 2024 1 commit
  18. 07 Jun, 2024 1 commit
  19. 04 Jun, 2024 1 commit
  20. 21 May, 2024 1 commit
  21. 20 May, 2024 3 commits
  22. 24 Apr, 2024 1 commit
  23. 16 Apr, 2024 2 commits
  24. 15 Apr, 2024 1 commit
  25. 10 Apr, 2024 1 commit
  26. 01 Apr, 2024 1 commit
  27. 29 Mar, 2024 1 commit
  28. 26 Mar, 2024 1 commit
  29. 15 Mar, 2024 1 commit
  30. 12 Mar, 2024 1 commit
  31. 08 Mar, 2024 1 commit
  32. 07 Mar, 2024 1 commit
  33. 21 Feb, 2024 1 commit
  34. 24 Jan, 2024 1 commit