inference-init.rst 1.14 KB
Newer Older
aiss's avatar
aiss committed
1
2
3
4
5
6
7
8
Inference Setup
-----------------------
The entrypoint for inference with DeepSpeed is ``deepspeed.init_inference()``.

Example usage:

.. code-block:: python

aiss's avatar
aiss committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
    engine = deepspeed.init_inference(model=net, config=config)

The ``DeepSpeedInferenceConfig`` is used to control all aspects of initializing
the ``InferenceEngine``. The config should be passed as a dictionary to
``init_inference``, but parameters can also be passed as keyword arguments.

.. _DeepSpeedInferenceConfig:
.. autopydantic_model:: deepspeed.inference.config.DeepSpeedInferenceConfig

.. _DeepSpeedTPConfig:
.. autopydantic_model:: deepspeed.inference.config.DeepSpeedTPConfig

.. _DeepSpeedMoEConfig:
.. autopydantic_model:: deepspeed.inference.config.DeepSpeedMoEConfig

.. _QuantizationConfig:
.. autopydantic_model:: deepspeed.inference.config.QuantizationConfig

.. _InferenceCheckpointConfig:
.. autopydantic_model:: deepspeed.inference.config.InferenceCheckpointConfig


Example config:

.. code-block:: python

    config = {
	"kernel_inject": True,
	"tensor_parallel": {"tp_size": 4},
	"dtype": "fp16",
	"enable_cuda_graph": False
    }
aiss's avatar
aiss committed
41
42

.. autofunction:: deepspeed.init_inference