Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
AutoAWQ
Commits
c88b2e25
Commit
c88b2e25
authored
Sep 13, 2023
by
Casper Hansen
Browse files
Add explanation of from_quantized variables
parent
196119b4
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
0 deletions
+12
-0
README.md
README.md
+12
-0
No files found.
README.md
View file @
c88b2e25
...
@@ -136,6 +136,18 @@ generation_output = model.generate(
...
@@ -136,6 +136,18 @@ generation_output = model.generate(
</details>
</details>
<details>
<summary>
AutoAWQForCausalLM.from_quantized
</summary>
-
`quant_path`
: Path to folder containing model files.
-
`quant_filename`
: The filename to model weights or
`index.json`
file.
-
`max_new_tokens`
: The max sequence length, used to allocate kv-cache for fused models.
-
`fuse_layers`
: Whether or not to use fused layers.
-
`batch_size`
: The batch size to initialize the AWQ model with.
</details>
## Benchmarks
## Benchmarks
| Model | GPU | FP16 latency (ms) | INT4 latency (ms) | Speedup |
| Model | GPU | FP16 latency (ms) | INT4 latency (ms) | Speedup |
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment