# Reranker 与embedding模型不同,reranker使用问题和文档作为输入,直接输出相关性而不是embedding。 您可以输入查询语句和段落到reranker后直接得到相关性得分。并且分数可以通过sigmoid函数映射到[0,1]中的浮点值。 ## Model List - [BAAI/bge-reranker-base](http://113.200.138.88:18080/aimodels/bge-reranker-base) - [BAAI/bge-reranker-large](http://113.200.138.88:18080/aimodels/bge-reranker-large) - [BAAI/bge-reranker-v2-m3](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-m3) - [BAAI/bge-reranker-v2-gemma](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-gemma) - [BAAI/bge-reranker-v2-minicpm-layerwise](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-minicpm-layerwise) 您可以根据个人场景和资源来选择所需模型: - 针对 **多种语言**,使用 [BAAI/bge-reranker-v2-m3](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-m3) 和 [BAAI/bge-reranker-v2-gemma](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-gemma) - 针对 **中文或者英文**, 使用 [BAAI/bge-reranker-v2-m3](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-m3) 和 [BAAI/bge-reranker-v2-minicpm-layerwise](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-minicpm-layerwise). - 针对 **效率**, 使用 [BAAI/bge-reranker-v2-m3](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-m3) 和 底层[BAAI/bge-reranker-v2-minicpm-layerwise](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-minicpm-layerwise). - 想要更好的效果, 建议 [BAAI/bge-reranker-v2-minicpm-layerwise](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-minicpm-layerwise) 和 [BAAI/bge-reranker-v2-gemma](http://113.200.138.88:18080/aimodels/baai/bge-reranker-v2-gemma) ## Usage ### 使用 FlagEmbedding 1. 确认环境配置完成,请参考[环境配置](../../README.md#环境配置) 2. `We couldn't connect to 'https://huggingface.co' to load this file`报错,需要先修改huggingface镜像源,修改方法如下: ``` pip install -U huggingface_hub hf_transfer export HF_ENDPOINT=https://hf-mirror.com ``` #### 常规 reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) 计算相关性得分(得分越高,相关性越强): ```python from FlagEmbedding import FlagReranker reranker = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation score = reranker.compute_score(['query', 'passage']) print(score) # -5.65234375 # You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score score = reranker.compute_score(['query', 'passage'], normalize=True) print(score) # 0.003497010252573502 scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]) print(scores) # [-8.1875, 5.26171875] # You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True) print(scores) # [0.00027803096387751553, 0.9948403768236574] ``` #### 针对 LLM-based reranker ```python from FlagEmbedding import FlagLLMReranker reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation # reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_bf16=True) # You can also set use_bf16=True to speed up computation with a slight performance degradation score = reranker.compute_score(['query', 'passage']) print(score) scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]) print(scores) ``` #### 针对 LLM-based layerwise reranker ```python from FlagEmbedding import LayerWiseFlagLLMReranker reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation # reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_bf16=True) # You can also set use_bf16=True to speed up computation with a slight performance degradation score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score. print(score) scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28]) print(scores) ``` ### 使用 Huggingface transformers #### 常规 reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) Get relevance scores (higher scores indicate more relevance): ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3') model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-v2-m3') model.eval() pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']] with torch.no_grad(): inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512) scores = model(**inputs, return_dict=True).logits.view(-1, ).float() print(scores) ``` #### 针对 LLM-based reranker ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer def get_inputs(pairs, tokenizer, prompt=None, max_length=1024): if prompt is None: prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'." sep = "\n" prompt_inputs = tokenizer(prompt, return_tensors=None, add_special_tokens=False)['input_ids'] sep_inputs = tokenizer(sep, return_tensors=None, add_special_tokens=False)['input_ids'] inputs = [] for query, passage in pairs: query_inputs = tokenizer(f'A: {query}', return_tensors=None, add_special_tokens=False, max_length=max_length * 3 // 4, truncation=True) passage_inputs = tokenizer(f'B: {passage}', return_tensors=None, add_special_tokens=False, max_length=max_length, truncation=True) item = tokenizer.prepare_for_model( [tokenizer.bos_token_id] + query_inputs['input_ids'], sep_inputs + passage_inputs['input_ids'], truncation='only_second', max_length=max_length, padding=False, return_attention_mask=False, return_token_type_ids=False, add_special_tokens=False ) item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs item['attention_mask'] = [1] * len(item['input_ids']) inputs.append(item) return tokenizer.pad( inputs, padding=True, max_length=max_length + len(sep_inputs) + len(prompt_inputs), pad_to_multiple_of=8, return_tensors='pt', ) tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-gemma') model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-gemma') yes_loc = tokenizer('Yes', add_special_tokens=False)['input_ids'][0] model.eval() pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']] with torch.no_grad(): inputs = get_inputs(pairs, tokenizer) scores = model(**inputs, return_dict=True).logits[:, -1, yes_loc].view(-1, ).float() print(scores) ``` #### 针对 LLM-based layerwise reranker ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer def get_inputs(pairs, tokenizer, prompt=None, max_length=1024): if prompt is None: prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'." sep = "\n" prompt_inputs = tokenizer(prompt, return_tensors=None, add_special_tokens=False)['input_ids'] sep_inputs = tokenizer(sep, return_tensors=None, add_special_tokens=False)['input_ids'] inputs = [] for query, passage in pairs: query_inputs = tokenizer(f'A: {query}', return_tensors=None, add_special_tokens=False, max_length=max_length * 3 // 4, truncation=True) passage_inputs = tokenizer(f'B: {passage}', return_tensors=None, add_special_tokens=False, max_length=max_length, truncation=True) item = tokenizer.prepare_for_model( [tokenizer.bos_token_id] + query_inputs['input_ids'], sep_inputs + passage_inputs['input_ids'], truncation='only_second', max_length=max_length, padding=False, return_attention_mask=False, return_token_type_ids=False, add_special_tokens=False ) item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs item['attention_mask'] = [1] * len(item['input_ids']) inputs.append(item) return tokenizer.pad( inputs, padding=True, max_length=max_length + len(sep_inputs) + len(prompt_inputs), pad_to_multiple_of=8, return_tensors='pt', ) tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trust_remote_code=True, torch_dtype=torch.bfloat16) model = model.to('cuda') model.eval() pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']] with torch.no_grad(): inputs = get_inputs(pairs, tokenizer).to(model.device) all_scores = model(**inputs, return_dict=True, cutoff_layers=[28]) all_scores = [scores[:, -1].view(-1, ).float() for scores in all_scores[0]] print(all_scores) ``` ## 微调 ### 数据格式 训练数据是一个json文件,其中每一行都是这样的字典: ``` {"query": str, "pos": List[str], "neg":List[str], "prompt": str} ``` `query` 是查询语句, `pos` 是正文本list, `neg` 是负文本list,`prompt`说明查询与文本的关系。 如果针对查询语句没有负文本,你可以随机从整个语料库中选取样本作为负样本,如[toy_finetune_data.jsonl](../../examples/finetune/toy_finetune_data.jsonl)。 ### Train 您可以跟随下面的步骤训练 reranker: **常规 reranker** (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) 参考: ../../examples/reranker **针对 llm-based reranker** (bge-reranker-v2-gemma) ```shell torchrun --nproc_per_node {number of gpus} \ -m FlagEmbedding.llm_reranker.finetune_for_instruction.run \ --output_dir {path to save model} \ --model_name_or_path google/gemma-2b \ --train_data ./toy_finetune_data.jsonl \ --learning_rate 2e-4 \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 16 \ --dataloader_drop_last True \ --query_max_len 512 \ --passage_max_len 512 \ --train_group_size 16 \ --logging_steps 1 \ --save_steps 2000 \ --save_total_limit 50 \ --ddp_find_unused_parameters False \ --gradient_checkpointing \ --deepspeed stage1.json \ --warmup_ratio 0.1 \ --bf16 \ --use_lora True \ --lora_rank 32 \ --lora_alpha 64 \ --use_flash_attn True \ --target_modules q_proj k_proj v_proj o_proj ``` **针对 llm-based layerwise reranker** (bge-reranker-v2-minicpm-layerwise) ```shell torchrun --nproc_per_node {number of gpus} \ -m FlagEmbedding.llm_reranker.finetune_for_layerwise.run \ --output_dir {path to save model} \ --model_name_or_path openbmb/MiniCPM-2B-dpo-bf16 \ --train_data ./toy_finetune_data.jsonl \ --learning_rate 2e-4 \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 16 \ --dataloader_drop_last True \ --query_max_len 512 \ --passage_max_len 512 \ --train_group_size 16 \ --logging_steps 1 \ --save_steps 2000 \ --save_total_limit 50 \ --ddp_find_unused_parameters False \ --gradient_checkpointing \ --deepspeed stage1.json \ --warmup_ratio 0.1 \ --bf16 \ --use_lora True \ --lora_rank 32 \ --lora_alpha 64 \ --use_flash_attn True \ --target_modules q_proj k_proj v_proj o_proj \ --start_layer 8 \ --head_multi True \ --head_type simple \ --lora_extra_parameters linear_head \ --finetune_type from_raw_model # should be one of ['from_raw_model', 'from_finetuned_model'] ``` rerankers 通过 [google/gemma-2b](https://huggingface.co/google/gemma-2b) (针对 llm-based reranker) 和 [openbmb/MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16) (针对 llm-based layerwise reranker) 进行初始化,使用混合多语言数据集进行训练。 - [bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data) - [quora train data](https://huggingface.co/datasets/quora) - [fever train data](https://fever.ai/dataset/fever.html) ### 融合模型 微调之后,需要进行模型融合。 **针对 llm-based reranker** ```python from FlagEmbedding.llm_reranker.merge import merge_llm merge_llm('google/gemma-2b', 'lora_llm_output_path', 'merged_model_output_paths') ``` **针对 llm-based layerwise reranker** 如果基于原始模型进行的微调(openbmb/MiniCPM-2B-dpo-bf16) ```shell from FlagEmbedding.llm_reranker.merge import merge_layerwise_raw_llm merge_layerwise_raw_llm('openbmb/MiniCPM-2B-dpo-bf16', 'lora_llm_output_path', 'merged_model_output_paths') ``` 如果基于微调模型进行的微调(BAAI/bge-reranker-v2-minicpm-layerwise) ```shell from FlagEmbedding.llm_reranker.merge import merge_layerwise_finetuned_llm merge_layerwise_finetuned_llm('BAAI/bge-reranker-v2-minicpm-layerwise', 'lora_llm_output_path', 'merged_model_output_paths') ```