Modify faiss in README and update dtk to 24.04.1

acf54ee3 · chenych · efffc63d · acf54ee3 · acf54ee3 · acf54ee3
Commit acf54ee3 authored Aug 20, 2024 by chenych
6 changed files
--- a/README.md
+++ b/README.md
@@ -11,12 +11,15 @@
 ### Docker（方法一）
 ```bash
-docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
 docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
 cd /your_code_path/FlagEmbedding_pytorch
 pip install -e .
+pip install peft
 pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
+export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
 ```
 ### Dockerfile（方法二）
@@ -27,13 +30,16 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
 cd /your_code_path/FlagEmbedding_pytorch
 pip install -e .
+pip install peft
 pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
+export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
 ```
 ### Anaconda（方法三）
 关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
 ```bash
-DTK驱动: dtk24.04
+DTK驱动: dtk24.04.1
 python: python3.10
 torch: 2.1.0
 ```
@@ -43,7 +49,10 @@ torch: 2.1.0
 ```bash
 cd /your_code_path/FlagEmbedding_pytorch
 pip install -e .
+pip install peft
 pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
+export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
 ```
 ## 已经适配的项目

--- a/docker/Dockerfile
+++ b/docker/Dockerfile
-FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
\ No newline at end of file
--- a/examples/finetune/README.md
+++ b/examples/finetune/README.md
 # Finetune
-In this example, we show how to finetune the baai-general-embedding with your data.
+在本例中，我们将展示如何使用您的数据对baai-general-embedding进行微调。
 ## 环境配置
 参考[环境配置](../../README.md#环境配置)
@@ -98,7 +98,7 @@ model = mix_models(
 使用这个方法，你只需要构造一些示例数据，并不需要去微调一个基础模型。比如，你可以合并来自[huggingface](https://huggingface.co/Shitao)上的模型，使用针对你的任务的示例数据。使用代码如下：
 ```python
-from LM_Cocktail import mix_models, mix_models_with_data
+from LM_Cocktail.LM_Cocktail import mix_models, mix_models_with_data
 example_data = [
    {"query": "How does one become an actor in the Telugu Film Industry?", "pos": [" How do I become an actor in Telugu film industry?"], "neg": [" What is the story of Moses and Ramesses?", " Does caste system affect economic growth of India?"]},

--- a/finetune.sh
+++ b/finetune.sh
@@ -21,3 +21,13 @@ torchrun --nproc_per_node {number of gpus} \
    --logging_steps 10 \
    --save_steps 1000 \
    --query_instruction_for_retrieval ""
+### Hard Negatives
+# python -m FlagEmbedding.baai_general_embedding.finetune.hn_mine \
+#     --model_name_or_path BAAI/bge-base-en-v1.5 \
+#     --input_file toy_finetune_data.jsonl \
+#     --output_file toy_finetune_data_minedHN.jsonl \
+#     --range_for_sampling 2-200 \
+#     --negative_number 15 \
+#     --use_gpu_for_searching
\ No newline at end of file
--- a/pretrain.sh
+++ b/pretrain.sh
@@ -13,4 +13,4 @@ torchrun --nproc_per_node {number of gpus} \
    --dataloader_drop_last True \
    --max_seq_length 512 \
    --logging_steps 10 \
    --dataloader_num_workers 12
\ No newline at end of file
--- a/requirements.txt
+++ b/requirements.txt
+peft
\ No newline at end of file