Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
AutoAWQ
Commits
d76125bf
"megatron/vscode:/vscode.git/clone" did not exist on "0760822bd0341775e22e298fd7a7bdafbe5f3f1b"
Unverified
Commit
d76125bf
authored
Sep 13, 2023
by
Casper
Committed by
GitHub
Sep 13, 2023
Browse files
Update README.md
parent
1b0af2d3
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
README.md
README.md
+1
-1
No files found.
README.md
View file @
d76125bf
...
...
@@ -74,7 +74,7 @@ Under examples, you can find examples of how to quantize, run inference, and ben
### INT4 GEMM vs INT4 GEMV vs FP16
There are two versions of AWQ: GEMM and GEMV. Both names to how matrix multiplication runs under the hood. We suggest the following:
There are two versions of AWQ: GEMM and GEMV. Both names
relate
to how matrix multiplication runs under the hood. We suggest the following:
-
GEMV (quantized): Best for small context, batch size 1, highest number of tokens/s.
-
GEMM (quantized): Best for larger context, up to batch size 8, faster than GEMV on batch size > 1, slower than GEMV on batch size = 1.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment