If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:
```python
fromvllmimportModelRegistry
# The entrypoint of your plugin
defregister():
fromvllmimportModelRegistry
ModelRegistry.register_model(
ModelRegistry.register_model(
"YourModelForCausalLM",
"your_code:YourModelForCausalLM"
)
)
```
!!! warning
If your model is a multimodal model, ensure the model class implements the [SupportsMultiModal][vllm.model_executor.models.interfaces.SupportsMultiModal] interface.
Read more about that [here][supports-multimodal].
!!! note
Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
For more information on adding entry points to your package, please check the [official documentation](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).