Unverified Commit af1ed38f authored by Merve Noyan's avatar Merve Noyan Committed by GitHub
Browse files

Safetensors conceptual guide (#905)

IDK what else to add in this guide, I looked for relevant code in TGI
codebase and saw that it's used in quantization as well (maybe I could
add that?)
parent b03d2621
......@@ -21,6 +21,8 @@
- sections:
- local: conceptual/streaming
title: Streaming
- local: conceptual/safetensors
title: Safetensors
- local: conceptual/flash_attention
title: Flash Attention
title: Conceptual Guides
# Safetensors
Safetensors is a model serialization format for deep learning models. It is [faster](https://huggingface.co/docs/safetensors/speed) and safer compared to other serialization formats like pickle (which is used under the hood in many deep learning libraries).
TGI depends on safetensors format mainly to enable [tensor parallelism sharding](./tensor_parallelism). For a given model repository during serving, TGI looks for safetensors weights. If there are no safetensors weights, TGI converts the PyTorch weights to safetensors format.
You can learn more about safetensors by reading the [safetensors documentation](https://huggingface.co/docs/safetensors/index).
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment