Commits · e97f802b2d74861af77997691a7d1c36498f6dca · OpenDAS / vllm_cscc

23 Jan, 2025 1 commit

[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906) · e97f802b

Gregory Shtrasberg authored Jan 23, 2025


Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>

e97f802b

06 Jan, 2025 1 commit
- [Kernel] Move attn_type to Attention.__init__() (#11690) · e20c92bb
  Chen Zhang authored Jan 07, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  e20c92bb
02 Nov, 2024 1 commit
- [Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559) · a78dd330
  sroy745 authored Nov 01, 2024
  
  a78dd330
01 Nov, 2024 1 commit
- [Core][VLM] Add precise multi-modal placeholder tracking (#8346) · 6c0b7f54
  Peter Salas authored Nov 01, 2024
```
Signed-off-by: Peter Salas <peter@fixie.ai>
```
  6c0b7f54
04 Oct, 2024 1 commit
- [Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973) · 05d68643
  ElizaWszola authored Oct 04, 2024
```
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
```
  05d68643
25 Sep, 2024 1 commit
- [Kernel] Fullgraph and opcheck tests (#8479) · 300da091
  bnellnm authored Sep 25, 2024
  
  300da091
13 Sep, 2024 1 commit
- [CI/Build] Reorganize models tests (#7820) · a84e598e
  Cyrus Leung authored Sep 14, 2024
  
  a84e598e
11 Sep, 2024 1 commit
- [Kernel][Misc] register ops to prevent graph breaks (#6917) · 73202dbe
  bnellnm authored Sep 11, 2024
```
Co-authored-by: Sage Moore <sage@neuralmagic.com>
```
  73202dbe
16 Aug, 2024 1 commit
- [Misc/Testing] Use `torch.testing.assert_close` (#7324) · 50b8d08d
  jon-chuang authored Aug 15, 2024
  
  50b8d08d
06 Aug, 2024 1 commit

[Core] Subclass ModelRunner to support cross-attention & encoder sequences... · fd95e026

afeldman-nm authored Aug 06, 2024


[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

fd95e026

08 Jul, 2024 1 commit

[Kernel] Correctly invoke prefill & decode kernels for cross-attention... · 543aa485

afeldman-nm authored Jul 08, 2024


[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

543aa485

04 Jun, 2024 1 commit
- [Bugfix]: During testing, use pytest monkeypatch for safely overriding the env... · f42a006b
  afeldman-nm authored Jun 03, 2024
```
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210)
```
  f42a006b