Update README.md

3a48097b · wangkx1 · 09c1cd2b · 3a48097b
Commit 3a48097b authored Aug 13, 2024 by wangkx1
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 1 deletion

README.md README.md +12 -1

No files found.
--- a/README.md
+++ b/README.md
 # 基于DCU开源代码适配Ollama

+现有问题:
+
+1. export HIP_VISIBLE_DEVICES=2,3,4,5 等多卡后, ollama依然会优先加载所有模型到2号卡。随机会加载模型到其他卡;
+2. 自测NV上ollama-v0.3.4的模型调度逻辑: 在多卡环境下, 并非多卡去推理一个模型。而是一张卡推一个模型。一个模型只可能会在1张卡上。假如说同时 run 了8个模型，那均衡分配8个模型到8张卡上面。超出8个模型之后, 同一张卡上会有多个模型。
+3. v0.1.43不支持gemma2;
+4. 想要尝试新的ollama模型调度策略，可以使用 https://developer.hpccube.com/codes/OpenDAS/ollama 下的 v0.3.5 版本, 自行验证;
+
+
+
 ## 适配步骤

 工程地址：http://developer.hpccube.com/codes/wangkx1/ollama_dcu.git
@@ -85,6 +94,8 @@ image.sourcefind.cn:5000/dcu/admin/base/pytorch   2.1.0-ubuntu20.04-dtk24.04.1-p

 ### **5. 进入容器**

+进入指定文件夹： `cd tutorial_ollama`
+

 <font color=red>**注意点:**</font>
 - `launch_` 前缀的脚本之中的 `export MY_CONTAINER="sg_t0"`, `sg_t0` 是容器的名字，需要自己修改，使得名字需要唯一，才能启动属于自己的容器。如果名字重复很可能会进入别人的容器;
@@ -201,4 +212,4 @@ ollama create llama3-zh -f ./xxx.mf
 ### **9. ollama + open-webui**


-见: [./tutorial_ollama/01-ollama_open-webui.md](./tutorial_ollama/01-ollama_open-webui.md)
\ No newline at end of file
+见: [./tutorial_ollama/01-ollama_open-webui.md](./tutorial_ollama/01-ollama_open-webui.md)