# LLaVA-OneVision: Easy Visual Task Transfer ## 论文 `LLaVA-OneVision: Easy Visual Task Transfer` * https://arxiv.org/pdf/2408.03326 ## 模型结构 该模型由三个部分组成,分别是LLM(Qwen-2),Vision Encoder(SigLIP)和Projector(两层MLP)。 ![alt text](readme_imgs/arch.png) ## 算法原理 该算法的主要原理在于AnyRes表示和任务迁移机制,使得单一模型能够覆盖多样化的视觉场景。 ![alt text](readme_imgs/alg.png) ## 环境配置 参考[README.md](../README.md) ## 数据集 无 ## 训练 无 ## 推理 ### 原生 单张图片输入 ```bash python single_image.py ``` 文本图像交错输入 ```bash python image-text.py ``` 视频输入 ```bash python video.py ``` 注意:在运行前需要修改文件中的参数。 ## result ![alt text](readme_imgs/result.png) ### 精度 无 ## 应用场景 参考[README.md](../README.md) ## 预训练权重 |model|url| |:---:|:---:| |lmms-lab/llava-onevision-qwen2-7b-ov| [hf](https://hf-mirror.com/lmms-lab/llava-onevision-qwen2-7b-ov) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-onevision-qwen2-7b-ov.git) | |lmms-lab/llava-onevision-qwen2-0.5b-ov| [hf](https://hf-mirror.com/lmms-lab/llava-onevision-qwen2-0.5b-ov) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-onevision-qwen2-0.5b-ov.git) | |lmms-lab/llava-onevision-qwen2-0.5b-si| [hf](https://hf-mirror.com/lmms-lab/llava-onevision-qwen2-0.5b-si) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-onevision-qwen2-0.5b-si.git) | |lmms-lab/llava-onevision-qwen2-7b-si| [hf](https://hf-mirror.com/lmms-lab/llava-onevision-qwen2-7b-si) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-onevision-qwen2-7b-si.git) | |lmms-lab/llava-onevision-qwen2-7b-ov-chat| [hf](https://hf-mirror.com/lmms-lab/llava-onevision-qwen2-7b-ov-chat) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-onevision-qwen2-7b-ov-chat.git) | |lmms-lab/llava-onevision-qwen2-72b-ov-chat| [hf](https://hf-mirror.com/lmms-lab/llava-onevision-qwen2-72b-ov-chat) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-onevision-qwen2-72b-ov-chat.git) | ## 源码仓库及问题反馈 * 参考[README.md](../README.md) ## 参考资料 * https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision.md