Commit acf54ee3 authored by chenych's avatar chenych
Browse files

Modify faiss in README and update dtk to 24.04.1

parent efffc63d
...@@ -11,12 +11,15 @@ ...@@ -11,12 +11,15 @@
### Docker(方法一) ### Docker(方法一)
```bash ```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310 docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/FlagEmbedding_pytorch cd /your_code_path/FlagEmbedding_pytorch
pip install -e . pip install -e .
pip install peft
pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
``` ```
### Dockerfile(方法二) ### Dockerfile(方法二)
...@@ -27,13 +30,16 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op ...@@ -27,13 +30,16 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
cd /your_code_path/FlagEmbedding_pytorch cd /your_code_path/FlagEmbedding_pytorch
pip install -e . pip install -e .
pip install peft
pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
``` ```
### Anaconda(方法三) ### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。 关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash ```bash
DTK驱动: dtk24.04 DTK驱动: dtk24.04.1
python: python3.10 python: python3.10
torch: 2.1.0 torch: 2.1.0
``` ```
...@@ -43,7 +49,10 @@ torch: 2.1.0 ...@@ -43,7 +49,10 @@ torch: 2.1.0
```bash ```bash
cd /your_code_path/FlagEmbedding_pytorch cd /your_code_path/FlagEmbedding_pytorch
pip install -e . pip install -e .
pip install peft
pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
``` ```
## 已经适配的项目 ## 已经适配的项目
......
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310 FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
\ No newline at end of file \ No newline at end of file
# Finetune # Finetune
In this example, we show how to finetune the baai-general-embedding with your data. 在本例中,我们将展示如何使用您的数据对baai-general-embedding进行微调。
## 环境配置 ## 环境配置
参考[环境配置](../../README.md#环境配置) 参考[环境配置](../../README.md#环境配置)
...@@ -98,7 +98,7 @@ model = mix_models( ...@@ -98,7 +98,7 @@ model = mix_models(
使用这个方法,你只需要构造一些示例数据,并不需要去微调一个基础模型。比如,你可以合并来自[huggingface](https://huggingface.co/Shitao)上的模型,使用针对你的任务的示例数据。使用代码如下: 使用这个方法,你只需要构造一些示例数据,并不需要去微调一个基础模型。比如,你可以合并来自[huggingface](https://huggingface.co/Shitao)上的模型,使用针对你的任务的示例数据。使用代码如下:
```python ```python
from LM_Cocktail import mix_models, mix_models_with_data from LM_Cocktail.LM_Cocktail import mix_models, mix_models_with_data
example_data = [ example_data = [
{"query": "How does one become an actor in the Telugu Film Industry?", "pos": [" How do I become an actor in Telugu film industry?"], "neg": [" What is the story of Moses and Ramesses?", " Does caste system affect economic growth of India?"]}, {"query": "How does one become an actor in the Telugu Film Industry?", "pos": [" How do I become an actor in Telugu film industry?"], "neg": [" What is the story of Moses and Ramesses?", " Does caste system affect economic growth of India?"]},
......
...@@ -21,3 +21,13 @@ torchrun --nproc_per_node {number of gpus} \ ...@@ -21,3 +21,13 @@ torchrun --nproc_per_node {number of gpus} \
--logging_steps 10 \ --logging_steps 10 \
--save_steps 1000 \ --save_steps 1000 \
--query_instruction_for_retrieval "" --query_instruction_for_retrieval ""
### Hard Negatives
# python -m FlagEmbedding.baai_general_embedding.finetune.hn_mine \
# --model_name_or_path BAAI/bge-base-en-v1.5 \
# --input_file toy_finetune_data.jsonl \
# --output_file toy_finetune_data_minedHN.jsonl \
# --range_for_sampling 2-200 \
# --negative_number 15 \
# --use_gpu_for_searching
\ No newline at end of file
...@@ -13,4 +13,4 @@ torchrun --nproc_per_node {number of gpus} \ ...@@ -13,4 +13,4 @@ torchrun --nproc_per_node {number of gpus} \
--dataloader_drop_last True \ --dataloader_drop_last True \
--max_seq_length 512 \ --max_seq_length 512 \
--logging_steps 10 \ --logging_steps 10 \
--dataloader_num_workers 12 --dataloader_num_workers 12
\ No newline at end of file
peft
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment