- 31 Jan, 2025 1 commit
-
-
Ryan Nguyen authored
**[Guided decoding performance optimization]** Sending the guided decoding bitmask in xgrammar to the GPU (`self.token_bitmask.to(scores.device)`) is a blocking operation that prevents the CPU from pre-launching the sampler kernels. The CPU waits until decode is complete, then copies the bitmask over. This PR changes the operation to async via setting `non-blocking=True`. (Current) The CPU is blocked on a `cudaStreamSynchronize` and only pre-empts the sampling kernels after bitmask application. Below is the Nsys profile for one decode phase from Llama 3.1 8B.  With the optimization, this is no longer the case:  --------- Signed-off-by:
Ryan N <ryan.nguyen@centml.ai>
-
- 21 Jan, 2025 1 commit
-
-
Cheng Kuan Yong Jason authored
Signed-off-by:Jason Cheng <jasoncky96@gmail.com>
-
- 19 Jan, 2025 1 commit
-
-
Michal Adamczyk authored
Signed-off-by:Michal Adamczyk <madamczyk@habana.ai>
-
- 31 Dec, 2024 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
- 30 Dec, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 23 Dec, 2024 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
- 19 Dec, 2024 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
- 18 Dec, 2024 1 commit
-
-
Wallas Henrique authored
-
- 14 Dec, 2024 1 commit
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 12 Dec, 2024 1 commit
-
-
Cody Yu authored
-
- 11 Dec, 2024 1 commit
-
-
Kevin H. Luu authored
-
- 10 Dec, 2024 2 commits
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Jeff Cook authored
-
- 06 Dec, 2024 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
- 05 Dec, 2024 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
- 03 Dec, 2024 2 commits
-
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
Aaron Pham authored
Signed-off-by:
Aaron Pham <contact@aarnphm.xyz> Signed-off-by:
mgoin <michael@neuralmagic.com> Co-authored-by:
mgoin <michael@neuralmagic.com>
-
- 26 Oct, 2024 1 commit
-
-
Vasiliy Alekseev authored
Signed-off-by:Vasily Alexeev <alvasian@yandex.ru>
-
- 25 Oct, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by:
pavlo-ruban <pavlo.ruban@servicenow.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 01 Oct, 2024 1 commit
-
-
Joe Runde authored
Signed-off-by:
Joe Runde <Joseph.Runde@ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 19 Sep, 2024 1 commit
-
-
Roger Wang authored
-
- 18 Sep, 2024 1 commit
-
-
Aaron Pham authored
Signed-off-by:
Aaron Pham <contact@aarnphm.xyz> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 17 Sep, 2024 1 commit
-
-
Roger Wang authored
-
- 04 Sep, 2024 1 commit
-
-
Kyle Mistele authored
Co-authored-by:
constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by:
Kyle Mistele <kyle@constellate.ai>
-
- 24 Aug, 2024 2 commits
-
-
youkaichao authored
-
Tyler Rockwood authored
-
- 04 Aug, 2024 1 commit
-
-
Yihuan Bu authored
Co-authored-by:Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 03 Aug, 2024 1 commit
-
-
Robert Shaw authored
Signed-off-by:
Joe Runde <Joseph.Runde@ibm.com> Co-authored-by:
Joe Runde <Joseph.Runde@ibm.com> Co-authored-by:
Joe Runde <joe@joerun.de> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Simon Mo <simon.mo@hey.com>
-
- 08 Jul, 2024 1 commit
-
-
Eric authored
-
- 05 Jun, 2024 1 commit
-
-
Breno Faria authored
Co-authored-by:
Simon Mo <simon.mo@hey.com> Co-authored-by:
Breno Faria <breno.faria@intrafind.com>
-
- 03 Jun, 2024 1 commit
-
-
Breno Faria authored
-
- 01 May, 2024 1 commit
-
-
Robert Caulk authored
-
- 29 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 20 Apr, 2024 1 commit
-
-
Ayush Rautwar authored
Co-authored-by:Ubuntu <ubuntu@ip-172-31-13-147.ec2.internal>
-
- 18 Apr, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
-
- 16 Apr, 2024 1 commit
-
-
Noam Gat authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-