Deleted duplicate sentence (#26394)

a8531f3b · titi · GitHub · a09130fe · a8531f3b
Unverified Commit a8531f3b authored Sep 26, 2023 by titi Committed by GitHub Sep 26, 2023
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 2 deletions

docs/source/en/perf_infer_gpu_one.md docs/source/en/perf_infer_gpu_one.md +0 -2

No files found.
--- a/docs/source/en/perf_infer_gpu_one.md
+++ b/docs/source/en/perf_infer_gpu_one.md
@@ -68,8 +68,6 @@ You can benefit from considerable speedups for fine-tuning and inference, especi

 To overcome this, one should use Flash Attention without padding tokens in the sequence for training (e.g., by packing a dataset, i.e., concatenating sequences until reaching the maximum sequence length. An example is provided [here](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py#L516).

-Below is the expected speedup you can get for a simple forward pass on [tiiuae/falcon-7b](https://hf.co/tiiuae/falcon-7b) with a sequence length of 4096 and various batch sizes without padding tokens:
-
 Below is the expected speedup you can get for a simple forward pass on [tiiuae/falcon-7b](https://hf.co/tiiuae/falcon-7b) with a sequence length of 4096 and various batch sizes, without padding tokens:

 <div style="text-align: center">