Commit dc99d2fc authored by Casper Hansen's avatar Casper Hansen
Browse files

Fix Falcon benchmark format

parent 7cf3c790
......@@ -228,10 +228,11 @@ generation_output = model.generate(
### Falcon 7B
Note: Fast generation, fast context processing
GPU: NVIDIA GeForce RTX 3090
Command: `python examples/benchmark.py --model_path casperhansen/falcon-7b-awq --quant_file awq_model_w4_g64.pt`
Version: GEMM
- Note: Fast generation, fast context processing
- GPU: NVIDIA GeForce RTX 3090
- Command: `python examples/benchmark.py --model_path casperhansen/falcon-7b-awq --quant_file awq_model_w4_g64.pt`
- Version: GEMM
| Batch Size | Prefill Length | Decode Length | Prefill tokens/s | Decode tokens/s | Memory (VRAM) |
|-------------:|-----------------:|----------------:|-------------------:|------------------:|:-----------------|
| 1 | 32 | 32 | 466.826 | 95.1413 | 4.47 GB (18.88%) |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment