- 19 Mar, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 14 Mar, 2025 2 commits
-
-
Jesse Gross authored
Currently there is a single context per sequence, shared all by all multimodal inputs. Since we build a vision encoder graph per image, with a large number of inputs we can eventually hit the maximum number of graph nodes per context. This changes to use a separate context for each image, ensuring that available resource limits are consistent.
-
Jesse Gross authored
Models may require that a set of inputs all be processed as part of the same batch. For example, if an image has multiple patches with fully connected attention between them, we should not split the batch in the middle of an image. Fixes #9697
-
- 13 Mar, 2025 1 commit
-
-
Michael Yang authored
-
- 12 Mar, 2025 1 commit
-
-
Bruce MacDonald authored
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
-
- 11 Mar, 2025 20 commits
-
-
jmorganca authored
-
Michael Yang authored
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
-
Jesse Gross authored
This is useful for a few things: - Work around bugs, such as having 2 images in one batch - Keep the image in a single batch for fully connected attention - Improve performance by not evaluating embeddings multiple times
-
Jesse Gross authored
Currently we are using positions, which are relative to a sequence and may not be unique.
-
Jesse Gross authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Jesse Gross authored
-
Michael Yang authored
-
Patrick Devine authored
-