# LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large Multimodal Models ## 论文 `LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models` * https://arxiv.org/pdf/2407.07895 ## 模型结构 参考[README.md](../README.md) ## 算法原理 在[README.md](../README.md)基础上,LLaVA-NeXT-Interleave的核心是通过统一的数据格式和联合训练策略,实现多模态任务的泛化与迁移。 ## 数据集 无 ## 训练 无 ## 推理 在 inference分支中 ### 原生 ```bash python ../playground/demo/interleave_demo.py --model_path path/to/ckpt ``` ### hf ```bash python inference_hf.py ``` 注意:运行前需要修改脚本中相应路径。 ## result ![alt text](readme_imgs/result.png) ### 精度 无 ## 应用场景 参考[README.md](../README.md) ## 预训练权重 |model|url| |:---:|:---:| |llava-next-interleave-qwen-7b|[hf](https://huggingface.co/lmms-lab/llava-next-interleave-qwen-7b) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-next-interleave-qwen-7b) | |llava-next-interleave-qwen-0.5b|[hf](https://hf-mirror.com/lmms-lab/llava-next-interleave-qwen-0.5b) \| [SCNet](http://113.200.138.88:18080/aimodels/lmms-lab/llava-next-interleave-qwen-0.5b.git) | |llava-interleave-qwen-0.5b-hf|[hf](https://huggingface.co/llava-hf/llava-interleave-qwen-0.5b-hf) \| [SCNet](http://113.200.138.88:18080/aimodels/llava-hf/llava-interleave-qwen-0.5b-hf) | |llava-interleave-qwen-7b-hf|[hf](https://huggingface.co/llava-hf/llava-interleave-qwen-7b-hf) \| [SCNet](http://113.200.138.88:18080/aimodels/llava-hf/llava-interleave-qwen-7b-hf) | |llava-interleave-qwen-7b-dpo-hf|[hf](https://huggingface.co/llava-hf/llava-interleave-qwen-7b-dpo-hf) \| [SCNet](http://113.200.138.88:18080/aimodels/llava-hf/llava-interleave-qwen-7b-dpo-hf) | 模型下载后保存至`ckpts`(需自行创建). ## 源码仓库及问题反馈 参考[README.md](../README.md) ## 参考资料 * https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA-NeXT-Interleave.md * https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/