- 29 Jul, 2024 1 commit
-
-
Joao Gante authored
* mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test
馃槇 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation
-
- 20 May, 2024 1 commit
-
-
Longjie Zheng authored
* first version * fix sliding window * fix style * add sliding window cache * fix style * address comments * fix test * fix style * move sliding window check inside cache init * revert changes on irrelevant files & add comment on SlidingWindowCache * address comments & fix style fix style * update causal mask * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] llama * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * revert CI from a10 to t4 * wrap up
-
- 30 Apr, 2024 1 commit
-
-
Joao Gante authored
-
- 22 Apr, 2024 1 commit
-
-
Steven Liu authored
* first draft * feedback * static cache snippet * feedback * feedback
-