Commit acf54ee3 authored by chenych's avatar chenych
Browse files

Modify faiss in README and update dtk to 24.04.1

parent efffc63d
......@@ -11,12 +11,15 @@
### Docker(方法一)
```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/FlagEmbedding_pytorch
pip install -e .
pip install peft
pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
```
### Dockerfile(方法二)
......@@ -27,13 +30,16 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
cd /your_code_path/FlagEmbedding_pytorch
pip install -e .
pip install peft
pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
```
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash
DTK驱动: dtk24.04
DTK驱动: dtk24.04.1
python: python3.10
torch: 2.1.0
```
......@@ -43,7 +49,10 @@ torch: 2.1.0
```bash
cd /your_code_path/FlagEmbedding_pytorch
pip install -e .
pip install peft
pip install faiss-1.7.2_dtk24.04_gitb7348e7df780-py3-none-any.whl
export LD_LIBRARY_PATH=/path/of/site-packages/faiss/:$LD_LIBRARY_PATH
```
## 已经适配的项目
......
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
\ No newline at end of file
# Finetune
In this example, we show how to finetune the baai-general-embedding with your data.
在本例中,我们将展示如何使用您的数据对baai-general-embedding进行微调。
## 环境配置
参考[环境配置](../../README.md#环境配置)
......@@ -98,7 +98,7 @@ model = mix_models(
使用这个方法,你只需要构造一些示例数据,并不需要去微调一个基础模型。比如,你可以合并来自[huggingface](https://huggingface.co/Shitao)上的模型,使用针对你的任务的示例数据。使用代码如下:
```python
from LM_Cocktail import mix_models, mix_models_with_data
from LM_Cocktail.LM_Cocktail import mix_models, mix_models_with_data
example_data = [
{"query": "How does one become an actor in the Telugu Film Industry?", "pos": [" How do I become an actor in Telugu film industry?"], "neg": [" What is the story of Moses and Ramesses?", " Does caste system affect economic growth of India?"]},
......
......@@ -21,3 +21,13 @@ torchrun --nproc_per_node {number of gpus} \
--logging_steps 10 \
--save_steps 1000 \
--query_instruction_for_retrieval ""
### Hard Negatives
# python -m FlagEmbedding.baai_general_embedding.finetune.hn_mine \
# --model_name_or_path BAAI/bge-base-en-v1.5 \
# --input_file toy_finetune_data.jsonl \
# --output_file toy_finetune_data_minedHN.jsonl \
# --range_for_sampling 2-200 \
# --negative_number 15 \
# --use_gpu_for_searching
\ No newline at end of file
peft
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment