- 03 Apr, 2025 1 commit
-
-
Michael Yang authored
-
- 02 Apr, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 20 Mar, 2025 1 commit
-
-
Jesse Gross authored
Options is no longer very descriptive of this struct.
-
- 14 Mar, 2025 1 commit
-
-
Jesse Gross authored
Models may require that a set of inputs all be processed as part of the same batch. For example, if an image has multiple patches with fully connected attention between them, we should not split the batch in the middle of an image. Fixes #9697
-
- 12 Mar, 2025 1 commit
-
-
Bruce MacDonald authored
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
-
- 11 Mar, 2025 13 commits
-
-
Jesse Gross authored
Currently we are using positions, which are relative to a sequence and may not be unique.
-
Jesse Gross authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
-
Jesse Gross authored
-
Michael Yang authored
-
Patrick Devine authored
-