Commit dc99d2fc authored by Casper Hansen's avatar Casper Hansen
Browse files

Fix Falcon benchmark format

parent 7cf3c790
...@@ -228,10 +228,11 @@ generation_output = model.generate( ...@@ -228,10 +228,11 @@ generation_output = model.generate(
### Falcon 7B ### Falcon 7B
Note: Fast generation, fast context processing - Note: Fast generation, fast context processing
GPU: NVIDIA GeForce RTX 3090 - GPU: NVIDIA GeForce RTX 3090
Command: `python examples/benchmark.py --model_path casperhansen/falcon-7b-awq --quant_file awq_model_w4_g64.pt` - Command: `python examples/benchmark.py --model_path casperhansen/falcon-7b-awq --quant_file awq_model_w4_g64.pt`
Version: GEMM - Version: GEMM
| Batch Size | Prefill Length | Decode Length | Prefill tokens/s | Decode tokens/s | Memory (VRAM) | | Batch Size | Prefill Length | Decode Length | Prefill tokens/s | Decode tokens/s | Memory (VRAM) |
|-------------:|-----------------:|----------------:|-------------------:|------------------:|:-----------------| |-------------:|-----------------:|----------------:|-------------------:|------------------:|:-----------------|
| 1 | 32 | 32 | 466.826 | 95.1413 | 4.47 GB (18.88%) | | 1 | 32 | 32 | 466.826 | 95.1413 | 4.47 GB (18.88%) |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment