Commit c88b2e25 authored by Casper Hansen's avatar Casper Hansen
Browse files

Add explanation of from_quantized variables

parent 196119b4
...@@ -136,6 +136,18 @@ generation_output = model.generate( ...@@ -136,6 +136,18 @@ generation_output = model.generate(
</details> </details>
<details>
<summary>AutoAWQForCausalLM.from_quantized</summary>
- `quant_path`: Path to folder containing model files.
- `quant_filename`: The filename to model weights or `index.json` file.
- `max_new_tokens`: The max sequence length, used to allocate kv-cache for fused models.
- `fuse_layers`: Whether or not to use fused layers.
- `batch_size`: The batch size to initialize the AWQ model with.
</details>
## Benchmarks ## Benchmarks
| Model | GPU | FP16 latency (ms) | INT4 latency (ms) | Speedup | | Model | GPU | FP16 latency (ms) | INT4 latency (ms) | Speedup |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment