Unverified Commit e656f638 authored by Albert's avatar Albert Committed by GitHub
Browse files

[Doc] fix the incorrect module path of tensorize_vllm_model (#13863)

parent 145944cb
...@@ -27,7 +27,7 @@ https://github.com/coreweave/tensorizer ...@@ -27,7 +27,7 @@ https://github.com/coreweave/tensorizer
To serialize a model, install vLLM from source, then run something To serialize a model, install vLLM from source, then run something
like this from the root level of this repository: like this from the root level of this repository:
python -m examples.offline_inference.tensorize_vllm_model \ python -m examples.other.tensorize_vllm_model \
--model facebook/opt-125m \ --model facebook/opt-125m \
serialize \ serialize \
--serialized-directory s3://my-bucket \ --serialized-directory s3://my-bucket \
...@@ -47,7 +47,7 @@ providing a `--keyfile` argument. ...@@ -47,7 +47,7 @@ providing a `--keyfile` argument.
To deserialize a model, you can run something like this from the root To deserialize a model, you can run something like this from the root
level of this repository: level of this repository:
python -m examples.offline_inference.tensorize_vllm_model \ python -m examples.other.tensorize_vllm_model \
--model EleutherAI/gpt-j-6B \ --model EleutherAI/gpt-j-6B \
--dtype float16 \ --dtype float16 \
deserialize \ deserialize \
...@@ -65,11 +65,11 @@ shard's rank. Sharded models serialized with this script will be named as ...@@ -65,11 +65,11 @@ shard's rank. Sharded models serialized with this script will be named as
model-rank-%03d.tensors model-rank-%03d.tensors
For more information on the available arguments for serializing, run For more information on the available arguments for serializing, run
`python -m examples.offline_inference.tensorize_vllm_model serialize --help`. `python -m examples.other.tensorize_vllm_model serialize --help`.
Or for deserializing: Or for deserializing:
`python -m examples.offline_inference.tensorize_vllm_model deserialize --help`. `python -m examples.other.tensorize_vllm_model deserialize --help`.
Once a model is serialized, tensorizer can be invoked with the `LLM` class Once a model is serialized, tensorizer can be invoked with the `LLM` class
directly to load models: directly to load models:
...@@ -90,7 +90,7 @@ TensorizerConfig arguments desired. ...@@ -90,7 +90,7 @@ TensorizerConfig arguments desired.
In order to see all of the available arguments usable to configure In order to see all of the available arguments usable to configure
loading with tensorizer that are given to `TensorizerConfig`, run: loading with tensorizer that are given to `TensorizerConfig`, run:
`python -m examples.offline_inference.tensorize_vllm_model deserialize --help` `python -m examples.other.tensorize_vllm_model deserialize --help`
under the `tensorizer options` section. These can also be used for under the `tensorizer options` section. These can also be used for
deserialization in this example script, although `--tensorizer-uri` and deserialization in this example script, although `--tensorizer-uri` and
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment