docs: add cpu benchmark (#1366)

* cpu benchmark * try to fix formatting * cleanup * cleanup --------- Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>

docs: add cpu benchmark (#1366)
* cpu benchmark * try to fix formatting * cleanup * cleanup --------- Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
e7c6fc61 · jiqing-feng · GitHub · aa57bd89 · e7c6fc61
Unverified Commit e7c6fc61 authored Sep 21, 2024 by jiqing-feng Committed by GitHub Sep 20, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 15 additions and 1 deletion

docs/source/non_cuda_backends.mdx docs/source/non_cuda_backends.mdx +15 -1

No files found.
--- a/docs/source/non_cuda_backends.mdx
+++ b/docs/source/non_cuda_backends.mdx
@@ -24,4 +24,18 @@ Thank you for your support!

 ### Intel

-### AMD
+The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
+
+#### Inference (CPU)
+
+| Data Type | BF16 | INT8 | NF4 | FP4 |
+|---|---|---|---|---|
+| Speed-Up (vs BF16) | 1.0x | 0.6x | 2.3x | 0.03x |
+| Memory (GB) | 13.1 | 7.6 | 5.0 | 4.6 |
+
+#### Fine-Tuning (CPU)
+
+| Data Type | AMP BF16 | INT8 | NF4 | FP4 |
+|---|---|---|---|---|
+| Speed-Up (vs AMP BF16) | 1.0x | 0.38x | 0.07x | 0.07x |
+| Memory (GB) | 40 | 9 | 6.6 | 6.6 |