# LLaVA-NeXT: A Strong Zero-shot Video Understanding Model ## 论文 `LLaVA-NeXT: A Strong Zero-shot Video Understanding Model` * https://llava-vl.github.io/blog/2024-04-30-llava-next-video/ ## 模型结构 参考[README.md](../README.md) ## 算法原理 参考[README.md](../README.md) ## 数据集 无 ## 训练 无 ## 推理 ### 原生 ```bash cd .. bash scripts/video/demo/video_demo.sh /path/to/LLaVA-NeXT-Video-7B-DPO vicuna_v1 32 2 average no_token True playground/demo/xU25MMA2N4aVtYay.mp4 ``` ### hf ```bash python inference_hf.py ``` ## result ![alt text](readme_imgs/result.png) ### 精度 无 ## 应用场景 参考[README.md](../README.md) ## 预训练权重 |model|url| |:---:|:---:| |LLaVA-NeXT-Video-7B-DPO|[hf](https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-7B-DPO) \| [SCNet]() | |LLaVA-NeXT-Video-7B-hf|[hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-hf) \| [SCNet]() | |LLaVA-NeXT-Video-7B-32K-hf|[hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-32K-hf) \| [SCNet]() | |LLaVA-NeXT-Video-7B-DPO-hf|[hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-DPO-hf) \| [SCNet]() | |LLaVA-NeXT-Video-34B-hf|[hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-34B-hf) \| [SCNet]() | |LLaVA-NeXT-Video-34B-DPO-hf|[hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-34B-DPO-hf) \| [SCNet]() | 模型下载后保存至`ckpts`(需自行创建). ## 源码仓库及问题反馈 参考[README.md](../README.md) ## 参考资料 * https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA-NeXT-Video.md * https://llava-vl.github.io/blog/2024-04-30-llava-next-video/