"test/srt/models/test_embedding_models.py" did not exist on "70cc0749ce0d8a6fa28323c057311ebe88a6157e"
offline_demo.md 1.34 KB
Newer Older
suily's avatar
init  
suily committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Running VTimeLLM inference Offline
Please follow the instructions below to run the VTimeLLM inference on your local GPU machine. 

Note: Our demo requires approximately 18 GB of GPU memory.

### Clone the VTimeLLM repository

```shell
conda create --name=vtimellm python=3.10
conda activate vtimellm

git clone https://github.com/huangb23/VTimeLLM.git
cd VTimeLLM
pip install -r requirements.txt
```

### Download weights

* Download clip model and vtimellm model from the [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fcheckpoints&mode=list) and place them into the 'checkpoints' directory.
* Download [Vicuna v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) or [ChatGLM3-6b](https://huggingface.co/THUDM/chatglm3-6b)  weights.

### Run the inference code


```shell
python -m vtimellm.inference --model_base <path to the Vicuna v1.5 weights> 
```

Alternatively, you can also choose to conduct multi-turn conversations in [Jupyter Notebook](inference.ipynb). Similarly, you need to set 'args.model_base' to the path of Vicuna v1.5.

If you want to run the VTimeLLM-ChatGLM version, please refer to the code in [inference_for_glm.ipynb](inference_for_glm.ipynb).

### Run the gradio demo

We have provided an offline gradio demo as follows:

```shell
cd vtimellm
python demo_gradio.py --model_base <path to the Vicuna v1.5 weights> 
```