"megatron/vscode:/vscode.git/clone" did not exist on "0760822bd0341775e22e298fd7a7bdafbe5f3f1b"
Unverified Commit d76125bf authored by Casper's avatar Casper Committed by GitHub
Browse files

Update README.md

parent 1b0af2d3
......@@ -74,7 +74,7 @@ Under examples, you can find examples of how to quantize, run inference, and ben
### INT4 GEMM vs INT4 GEMV vs FP16
There are two versions of AWQ: GEMM and GEMV. Both names to how matrix multiplication runs under the hood. We suggest the following:
There are two versions of AWQ: GEMM and GEMV. Both names relate to how matrix multiplication runs under the hood. We suggest the following:
- GEMV (quantized): Best for small context, batch size 1, highest number of tokens/s.
- GEMM (quantized): Best for larger context, up to batch size 8, faster than GEMV on batch size > 1, slower than GEMV on batch size = 1.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment