"lib/bindings/python/vscode:/vscode.git/clone" did not exist on "045b61dd5bb736973028b2d17b32359d8e4f9e25"
Unverified Commit d3fe8571 authored by Yan Ma's avatar Yan Ma Committed by GitHub
Browse files

update doc for online fp8 quantization (#37851)


Signed-off-by: default avatarYan Ma <yan.ma@intel.com>
parent f85e479e
...@@ -137,6 +137,3 @@ llm = LLM("facebook/opt-125m", quantization="fp8") ...@@ -137,6 +137,3 @@ llm = LLM("facebook/opt-125m", quantization="fp8")
result = llm.generate("Hello, my name is") result = llm.generate("Hello, my name is")
print(result[0].outputs[0].text) print(result[0].outputs[0].text)
``` ```
!!! warning
Currently, we load the model at original precision before quantizing down to 8-bits, so you need enough memory to load the whole model.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment