# Mutlimoda LLM Deployment ## Table of Contents - [Environment Setup](#environment-setup) - [qwen-vl-chat](#qwen-vl-chat) - [yi-vl-6b-chat](#yi-vl-6b-chat) - [minicpm-v-v2_5-chat](#minicpm-v-v2_5-chat) - [qwen-vl](#qwen-vl) ## Environment Setup ```shell git clone https://github.com/modelscope/swift.git cd swift pip install -e '.[llm]' pip install vllm ``` Here we provide examples of four models (selecting smaller-sized models to facilitate experiments): qwen-vl-chat, qwen-vl, yi-vl-6b-chat, and minicpm-v-v2_5-chat. From these examples, you can identify three different types of MLLMs: a single round of dialogue can contain multiple images (or no images), a single round of dialogue can only contain one image, and the way the entire dialogue revolves around an image and the differences in deployment and invocation methods, as well as the differences between the chat and base models within MLLMs. If you're using qwen-audio-chat, simply replace the `` tag with `