registration.md 2.22 KB
Newer Older
1
2
3
4
---
title: Registering a Model to vLLM
---
[](){ #new-model-registration }
5
6

vLLM relies on a model registry to determine how to run each model.
7
A list of pre-registered architectures can be found [here][supported-models].
8
9
10
11
12
13

If your model is not on this list, you must register it to vLLM.
This page provides detailed instructions on how to do so.

## Built-in models

14
To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source][build-from-source].
15
16
This gives you the ability to modify the codebase and test your model.

17
After you have implemented your model (see [tutorial][new-model-basic]), put it into the <gh-dir:vllm/model_executor/models> directory.
18
Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
19
Finally, update our [list of supported models][supported-models] to promote your model!
20

21
22
!!! warning
    The list of models in each section should be maintained in alphabetical order.
23
24
25
26
27

## Out-of-tree models

You can load an external model using a plugin without modifying the vLLM codebase.

28
29
!!! info
    [vLLM's Plugin System][plugin-system]
30
31
32
33
34
35
36
37
38
39
40
41
42
43

To register the model, use the following code:

```python
from vllm import ModelRegistry
from your_code import YourModelForCausalLM
ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
```

If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:

```python
from vllm import ModelRegistry

Reid's avatar
Reid committed
44
45
46
47
ModelRegistry.register_model(
    "YourModelForCausalLM",
    "your_code:YourModelForCausalLM"
)
48
49
```

50
51
52
!!! warning
    If your model is a multimodal model, ensure the model class implements the [SupportsMultiModal][vllm.model_executor.models.interfaces.SupportsMultiModal] interface.
    Read more about that [here][supports-multimodal].
53

54
55
!!! note
    Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.