"mmdet/vscode:/vscode.git/clone" did not exist on "97d08556373c0bc98badf4f2b17a114d97a70124"
- 11 Mar, 2025 1 commit
-
-
Patrick Devine authored
-
- 04 Mar, 2025 1 commit
-
-
Daniel Hiltgen authored
* Include unified vision layers in memory prediction For newer vision models with a single gguf, include the projection estimates. * Adjust CLI to handle both styles of vision model metadata * Wire up new tokenizers for new engine If we're loading the new engine, utilize the new model text processor instead of calling into cgo wrappers for llama.cpp. This also cleans up some tech debt from the older tokenization flow for the C++ server which was no longer used. This also adjusts the grammar handling logic to pass through to the new engine instead of utilizing the cgo schema to grammar call. * Lay foundation for auto selection of new engine
-
- 27 Feb, 2025 1 commit
-
-
Michael Yang authored
-
- 25 Feb, 2025 1 commit
-
-
Michael Yang authored
this was accidentally removed when moving fs/ggml from its previous location
-
- 14 Feb, 2025 1 commit
-
-
Michael Yang authored
feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 03 Dec, 2024 1 commit
-
-
Sam authored
-
- 01 Nov, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 15 Oct, 2024 1 commit
-
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 12 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 31 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 10 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 27 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.
-
- 20 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 18 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block
-
- 11 Jun, 2024 1 commit
-
-
Michael Yang authored
This reverts commit f5f245cc, reversing changes made to 94d37fdc. this change broke gguf v2 which is incorrectly detected as big endian
-
- 08 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 06 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 24 May, 2024 2 commits
-
-
Michael Yang authored
Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
Michael Yang authored
-
- 23 May, 2024 1 commit
-
-
Bruce MacDonald authored
Co-authored-by:ManniX-ITA <20623405+mann1x@users.noreply.github.com>
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-
- 10 May, 2024 1 commit
-
-
Michael Yang authored
-
- 08 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}
-
- 23 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 17 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 11 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 10 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 04 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 03 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 02 Apr, 2024 1 commit
-
-
Michael Yang authored
-