inference-init.rst

Inference Setup
-----------------------
The entrypoint for inference with DeepSpeed is ``deepspeed.init_inference()``.

Example usage:

.. code-block:: python

    engine = deepspeed.init_inference(model=net, config=config)

The ``DeepSpeedInferenceConfig`` is used to control all aspects of initializing
the ``InferenceEngine``. The config should be passed as a dictionary to
``init_inference``, but parameters can also be passed as keyword arguments.

.. _DeepSpeedInferenceConfig:
.. autopydantic_model:: deepspeed.inference.config.DeepSpeedInferenceConfig

.. _DeepSpeedTPConfig:
.. autopydantic_model:: deepspeed.inference.config.DeepSpeedTPConfig

.. _DeepSpeedMoEConfig:
.. autopydantic_model:: deepspeed.inference.config.DeepSpeedMoEConfig

.. _QuantizationConfig:
.. autopydantic_model:: deepspeed.inference.config.QuantizationConfig

.. _InferenceCheckpointConfig:
.. autopydantic_model:: deepspeed.inference.config.InferenceCheckpointConfig


Example config:

.. code-block:: python

    config = {
	"kernel_inject": True,
	"tensor_parallel": {"tp_size": 4},
	"dtype": "fp16",
	"enable_cuda_graph": False
    }

.. autofunction:: deepspeed.init_inference