fix: better warmup error

96a982ad · OlivierDehaene · f9910d13 · 96a982ad
Commit 96a982ad authored Oct 25, 2023 by OlivierDehaene
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

server/text_generation_server/models/flash_causal_lm.py server/text_generation_server/models/flash_causal_lm.py +1 -1

No files found.
--- a/server/text_generation_server/models/flash_causal_lm.py
+++ b/server/text_generation_server/models/flash_causal_lm.py
@@ -670,7 +670,7 @@ class FlashCausalLM(Model):
                self.device,
            )
            _, batch = self.generate_token(batch)
-        except Exception as e:
+        except torch.cuda.OutOfMemoryError as e:
            raise RuntimeError(
                f"Not enough memory to handle {len(batch.input_ids)} prefill tokens. "
                f"You need to decrease `--max-batch-prefill-tokens`"