• fxmarty's avatar
    ROCm and sliding windows fixes (#2033) · 9b3674d9
    fxmarty authored
    * update vllm commit & fix models using sliding window
    
    * update
    
    * update commit
    
    * fix bug where tunableop is bound to cuda graph even when cuda graph are disabled
    
    * enable tunableop by default
    
    * fix sliding window
    
    * address review
    
    * dead code
    
    * precise comment
    
    * is it flaky?
    9b3674d9
cli.py 10.8 KB