"...git@developer.sourcefind.cn:jerrrrry/infinicore.git" did not exist on "4cd1f6881bf67dd14183c18307bacbca3612c2c7"
-
Xiaowei Ren authored
* add flash implementation with context parallelism Signed-off-by:
xren <xren@nvidia.com> * next more comments Signed-off-by:
xren <xren@nvidia.com> * code comment fix Signed-off-by:
xren <xren@nvidia.com> * comment fix Signed-off-by:
xren <xren@nvidia.com> * add missing space Signed-off-by:
xren <xren@nvidia.com> * fix docstrings Signed-off-by:
xren <xren@nvidia.com> * try to add fa v2 api Signed-off-by:
xren <xren@nvidia.com> * fix a comment Signed-off-by:
xren <xren@nvidia.com> * fix padded kv return Signed-off-by:
xren <xren@nvidia.com> * add docstrings of context parallelism Signed-off-by:
xren <xren@nvidia.com> * minor fix Signed-off-by:
xren <xren@nvidia.com> * minor docstring fix Signed-off-by:
xren <xren@nvidia.com> * fix positional arguments Signed-off-by:
xren <xren@nvidia.com> * make docstring line shorter Signed-off-by:
xren <xren@nvidia.com> * add fa v2 backward api for flash_attn_with_cp Signed-off-by:
xren <xren@nvidia.com> * remove redundant code Signed-off-by:
xren <xren@nvidia.com> * make sure hidden size per attn head is multiple of 8 for FA2 Signed-off-by:
xren <xren@nvidia.com> * remove an unnecessary assert check for FA2 Signed-off-by:
xren <xren@nvidia.com> * indention fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * Update FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
xren <xren@nvidia.com> Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
479dbb73