README.md 1.05 KB
Newer Older
zhouxiang's avatar
zhouxiang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Vision-Language Web Demo

A chatbot demo with image input.

## Supported Models

- [InternLM/InternLM-XComposer](https://github.com/InternLM/InternLM-XComposer/tree/main)
- [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)

## Quick Start

### internlm/internlm-xcomposer-7b

- extract llm model from huggingface model
  ```python
  python extract_xcomposer_llm.py
  # the llm part will saved to internlm_model folder.
  ```
- lanuch the demo
  ```python
  python app.py --model-name internlm-xcomposer-7b --llm-ckpt internlm_model
  ```

### Qwen-VL-Chat

- lanuch the dmeo
  ```python
  python app.py --model-name qwen-vl-chat --hf-ckpt Qwen/Qwen-VL-Chat
  ```

## Limitations

- this demo uses the code in their repo to extract image features that might not very efficiency.
- this demo only contains the chat function. If you want to use localization ability in Qwen-VL-Chat or article generation function in InternLM-XComposer, you need implement these pre/post processes. The difference compared to chat is how to build prompts and use the output of model.