@@ -9,12 +9,12 @@ The implementation is based on llava, and is compatible with llava and mobileVLM
...
@@ -9,12 +9,12 @@ The implementation is based on llava, and is compatible with llava and mobileVLM
Notice: The overall process of model inference for both **MobileVLM** and **MobileVLM_V2** models is the same, but the process of model conversion is a little different. Therefore, using **MobileVLM-1.7B** as an example, the different conversion step will be shown.
Notice: The overall process of model inference for both **MobileVLM** and **MobileVLM_V2** models is the same, but the process of model conversion is a little different. Therefore, using **MobileVLM-1.7B** as an example, the different conversion step will be shown.
## Usage
## Usage
Build with cmake or run `make llava-cli` to build it.
Build with cmake or run `make llama-llava-cli` to build it.
After building, run: `./llava-cli` to see the usage. For example:
After building, run: `./llama-llava-cli` to see the usage. For example:
-p"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWho is the author of this book? Answer the question using a single word or phrase. ASSISTANT:"
-p"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWho is the author of this book? Answer the question using a single word or phrase. ASSISTANT:"
3. Use `convert-image-encoder-to-gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
3. Use `convert_image_encoder_to_gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
Now both the LLaMA part and the image encoder is in the `MobileVLM-1.7B` directory.
Now both the LLaMA part and the image encoder is in the `MobileVLM-1.7B` directory.
...
@@ -82,7 +82,7 @@ refer to `android/adb_run.sh`, modify resources' `name` and `path`
...
@@ -82,7 +82,7 @@ refer to `android/adb_run.sh`, modify resources' `name` and `path`
### case 1
### case 1
**input**
**input**
```sh
```sh
/data/local/tmp/llava-cli \
/data/local/tmp/llama-llava-cli \
-m /data/local/tmp/ggml-model-q4_k.gguf \
-m /data/local/tmp/ggml-model-q4_k.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
-t 4 \
-t 4 \
...
@@ -102,7 +102,7 @@ llama_print_timings: total time = 34731.93 ms
...
@@ -102,7 +102,7 @@ llama_print_timings: total time = 34731.93 ms
### case 2
### case 2
**input**
**input**
```sh
```sh
/data/local/tmp/llava-cli \
/data/local/tmp/llama-llava-cli \
-m /data/local/tmp/ggml-model-q4_k.gguf \
-m /data/local/tmp/ggml-model-q4_k.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
-t 4 \
-t 4 \
...
@@ -126,7 +126,7 @@ llama_print_timings: total time = 34570.79 ms
...
@@ -126,7 +126,7 @@ llama_print_timings: total time = 34570.79 ms
#### llava-cli release-b2005
#### llava-cli release-b2005
**input**
**input**
```sh
```sh
/data/local/tmp/llava-cli \
/data/local/tmp/llama-llava-cli \
-m /data/local/tmp/ggml-model-q4_k.gguf \
-m /data/local/tmp/ggml-model-q4_k.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
-t 4 \
-t 4 \
...
@@ -194,13 +194,13 @@ llama_print_timings: total time = 44411.01 ms / 377 tokens
...
@@ -194,13 +194,13 @@ llama_print_timings: total time = 44411.01 ms / 377 tokens
## Orin compile and run
## Orin compile and run
### compile
### compile
```sh
```sh
make LLAMA_CUDA=1 CUDA_DOCKER_ARCH=sm_87 LLAMA_CUDA_F16=1 -j 32
make GGML_CUDA=1 CUDA_DOCKER_ARCH=sm_87 GGML_CUDA_F16=1 -j 32
```
```
### run on Orin
### run on Orin
### case 1
### case 1
**input**
**input**
```sh
```sh
./llava-cli \
./llama-llava-cli \
-m /data/local/tmp/ggml-model-q4_k.gguf \
-m /data/local/tmp/ggml-model-q4_k.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
--image /data/local/tmp/demo.jpeg \
--image /data/local/tmp/demo.jpeg \
...
@@ -224,7 +224,7 @@ llama_print_timings: total time = 1352.63 ms / 252 tokens
...
@@ -224,7 +224,7 @@ llama_print_timings: total time = 1352.63 ms / 252 tokens
### case 2
### case 2
**input**
**input**
```sh
```sh
./llava-cli \
./llama-llava-cli \
-m /data/local/tmp/ggml-model-q4_k.gguf \
-m /data/local/tmp/ggml-model-q4_k.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
-p"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat is in the image? ASSISTANT:"\
-p"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat is in the image? ASSISTANT:"\
@@ -10,7 +10,7 @@ prompt="A chat between a curious user and an artificial intelligence assistant.
...
@@ -10,7 +10,7 @@ prompt="A chat between a curious user and an artificial intelligence assistant.
# prompt="A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat is in the image? ASSISTANT:"
# prompt="A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat is in the image? ASSISTANT:"
# checkpoint_paths = [path for path in model_files if (path.endswith('.bin') and path.startswith('pytorch')) or (path.endswith('.safetensors') and path.startswith('model'))]