[Fix] Remove unused code to reduce binary size (#181)

* clean-up * fix lint * fix lint

[Fix] Remove unused code to reduce binary size (#181)
* clean-up * fix lint * fix lint
981a4610 · Li Zhang · GitHub · 83697422 · 83697422
Unverified Commit 981a4610 authored Jul 31, 2023 by Li Zhang Committed by GitHub Jul 31, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 9 deletions

src/turbomind/models/llama/prefix_cache.h src/turbomind/models/llama/prefix_cache.h +0 -9

No files found.
--- a/src/turbomind/models/llama/prefix_cache.h
+++ b/src/turbomind/models/llama/prefix_cache.h
-// Copyright (c) OpenMMLab. All rights reserved.
-#include <cuda_fp16.h>
-template<typename T>
-void invokeInsertKeyCache(T* key_cache, const T* src, int L, int H, int Dx, int s, int X, int S, cudaStream_t st);
-template<typename T>
-void invokeInsertValueCache(T* value_cache, const T* src, int L, int H, int s, int D, int S, cudaStream_t st);