Add explanation of from_quantized variables

c88b2e25 · Casper Hansen · 196119b4 · c88b2e25
Commit c88b2e25 authored Sep 13, 2023 by Casper Hansen
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 0 deletions

README.md README.md +12 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -136,6 +136,18 @@ generation_output = model.generate(
 </details>
+<details>
+<summary>AutoAWQForCausalLM.from_quantized</summary>
+- `quant_path`: Path to folder containing model files.
+- `quant_filename`: The filename to model weights or `index.json` file.
+- `max_new_tokens`: The max sequence length, used to allocate kv-cache for fused models.
+- `fuse_layers`: Whether or not to use fused layers.
+- `batch_size`: The batch size to initialize the AWQ model with.
+</details>
 ## Benchmarks
 | Model       | GPU   | FP16 latency (ms) | INT4 latency (ms) | Speedup |