Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
text-generation-inference
Commits
d4bccff3
Commit
d4bccff3
authored
Jan 24, 2025
by
xuxzh1
🎱
Browse files
Optimize the performance of GPTQ
parent
ee3d6944
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
server/exllamav2_kernels/exllamav2_kernels/cuda/q_gemm.cu
server/exllamav2_kernels/exllamav2_kernels/cuda/q_gemm.cu
+1
-1
No files found.
server/exllamav2_kernels/exllamav2_kernels/cuda/q_gemm.cu
View file @
d4bccff3
...
@@ -10,7 +10,7 @@
...
@@ -10,7 +10,7 @@
#include "quant/qdq_6.cuh"
#include "quant/qdq_6.cuh"
#include "quant/qdq_8.cuh"
#include "quant/qdq_8.cuh"
#define GPTQ_BLOCK_KN_SIZE
128
#define GPTQ_BLOCK_KN_SIZE
256
#define GPTQ_BLOCK_M_SIZE_MAX 8
#define GPTQ_BLOCK_M_SIZE_MAX 8
#define GPTQ_MAX_GROUPS_IN_BLOCK (GPTQ_BLOCK_KN_SIZE / 32)
#define GPTQ_MAX_GROUPS_IN_BLOCK (GPTQ_BLOCK_KN_SIZE / 32)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment