Commit a3d43354 authored by wanglch's avatar wanglch
Browse files

Initial commit

parents
Pipeline #1498 failed with stages
in 0 seconds
HIP_VISIBLE_DEVICES=5 python cli_demo.py
\ No newline at end of file
# Fine-tune the CogVLM2 model
[中文版README](./README_zh.md)
## Note
+ This code only provides fine-tuning examples for the huggingface version model 'cogvlm2-llama3-chat-19B'.
+ Only examples of fine-tuning language models are provided.
+ Only provide Lora fine-tuning examples.
+ Only provide examples of fine-tuning the dialogue model.
+ We currently do not support using 'zero3' fine-tuning, which may result in the model not being able to read.
## Minimum configuration
- We Only test A100 GPUs with 80GB memory for finetune. It requires at least 73GB of GPU memory using 8 GPUs with zero2.
- Tensor parallelism is not supported yet, that is, the model is split into multiple graphics cards for fine-tuning.
## Start fine-tuning
1. Download the data set and install dependencies
In this demo, developers can use the [CogVLM-SFT-311K](https://huggingface.co/datasets/THUDM/CogVLM-SFT-311K) open
source data set provided by us or build their own data set in the same format for fine-tuning. .
The data format is as follows:
+ The data set consists of two folders, `images` and `labels` (in CogVLM-SFT-311K, they are `labels_en` and `labels_zh`,
corresponding to Chinese and English labels respectively.
In the fine-tuning code, you can modify these two lines of code to modify the folder name.
```python
self.image_dir = os.path.join(root_dir, 'images')
self.label_dir = os.path.join(root_dir, 'labels_en') # or 'labels_zh' or 'labels' can be modified by yourself
```
+ Image files are stored in the `images` folder, and corresponding label files are stored in the `labels` folder. There
is a one-to-one correspondence between the names of pictures and label files. The format of image files is `jpg`, and
the format of label files is `json`.
+ Each tag file contains a dialogue. The dialogue consists of two roles: `user` and `assistant`. The dialogue content of
each role consists of two fields: `role` and `content`. As shown in the fields below.
```
{
"conversations": [
{
"role": "user",
"content": "What can be inferred about the zebras' behavior and surroundings?"
},
{
"role": "assistant",
"content": "Based on the image, we can infer that the two zebras are likely seeking relief from the sun's heat, as they are standing side by side under the branches of a thorny tree. This shade-providing tree offers some respite from the sun, possibly during the hottest part of the day. The zebras are in a green field with grass, providing them with an ideal environment to graze and eat while staying near their source of shelter. This shows that the zebras' behavior is influenced by the conditions and available resources in their surroundings. It also highlights that these animals adopt strategies to adapt to the fluctuating conditions of their environment, such as cooperation and seeking shelter, to survive and thrive in their natural habitat."
}
]
}
```
Before starting fine-tuning, you need to install the relevant dependencies. you also need to install the dependencies in the [basic_demo](../basic_demo/requirements.txt).
```bash
pip install -r requirements.txt
```
**Note**: `mpi4py` may need to install other Linux dependency packages. Please install it yourself according to your
system environment.
2. Run the fine-tuning program
We provide a fine-tuning script `peft_lora.py` that uses multiple cards on a single machine (including a single card).
You can start fine-tuning by running the following command.
```bash
deepspeed peft_lora.py --ds_config ds_config.yaml
```
The figure below shows the memory usage during fine-tuning.
Parameter information:
+ `max_input_len`: 512
+ `max_output_len`: 512
+ `batch_size_per_gpus`: 1
+ `lora_target`: vision_expert_query_key_value
GPU memory usage:
```shell
+-------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================|
| 0 N/A N/A 704914 C python 72442MiB |
| 1 N/A N/A 704915 C python 72538MiB |
| 2 N/A N/A 704916 C python 72538MiB |
| 3 N/A N/A 704917 C python 72538MiB |
| 4 N/A N/A 704918 C python 72538MiB |
| 5 N/A N/A 704919 C python 72538MiB |
| 6 N/A N/A 704920 C python 72538MiB |
| 7 N/A N/A 704921 C python 72442MiB |
+-------------------------------------------------------------+
```
While the code is running, Loss data will be recorded by tensorboard to facilitate visual viewing of Loss convergence.
```shell
tensorboard --logdir=output
```
**Note**: We strongly recommend that you use the `BF16` format for fine-tuning to avoid the problem of Loss being `NaN`.
3. Inference on the fine-tuned model
By running `peft_infer.py` you can use the fine-tuned model to generate text. You need to configure the fine-tuned model
address according to the configuration requirements in the code. Then run:
```shell
python peft_infer.py
```
You can use the fine-tuned model for inference.
\ No newline at end of file
# 微调 CogVLM2 模型
[Read this in English.](./README.md)
运行本demo来使用Lora微调 CogVLM2 中的**语言模型**部分。
## 注意
+ 本代码仅提供了 huggingface 版本模型 `cogvlm2-llama3-chat-19B` 的微调示例。
+ 仅提供了微调语言模型的示例。
+ 仅提供Lora微调示例。
+ 仅提供对话模型微调示例。
+ 暂不支持使用 `zero3` 微调,这可能出现 模型无法读取的情况。
## 最低配置
- 我们仅在具有80GB内存的A100 GPU上进行了微调测试。使用零冗余优化策略2(zero2)时,至少需要73GB的GPU内存,并且需要8个GPU。
- 暂不支持 Tensor 并行,即模型拆分到多张显卡微调。
## 开始微调
1. 下载数据集和安装依赖
本demo中,开发者可以使用由我们提供[CogVLM-SFT-311K](https://huggingface.co/datasets/THUDM/CogVLM-SFT-311K)
开源数据集或自行构建相同格式的数据集进行微调。
数据格式如下:
+ 数据集由 `images``labels` 两个文件夹组成 (在 CogVLM-SFT-311K 中 为 `labels_en``labels_zh`,分别对应中英文标签。
在微调代码中,你可以修改这两行代码来修改文件夹名称。
```python
self.image_dir = os.path.join(root_dir, 'images')
self.label_dir = os.path.join(root_dir, 'labels_en') # or 'labels_zh' or 'labels' 可以自行修改
```
+ `images` 文件夹中存放了图片文件,`labels`
文件夹中存放了对应的标签文件。图片和标签文件的名称一一对应。图片文件的格式为 `jpg`,标签文件的格式为 `json`
+ 每个标签文件中包含了一段对话,对话由 `user``assistant` 两个角色组成,每个角色的对话内容由 `role``content`
两个字段组成。如下字段所示。
```
{
"conversations": [
{
"role": "user",
"content": "What can be inferred about the zebras' behavior and surroundings?"
},
{
"role": "assistant",
"content": "Based on the image, we can infer that the two zebras are likely seeking relief from the sun's heat, as they are standing side by side under the branches of a thorny tree. This shade-providing tree offers some respite from the sun, possibly during the hottest part of the day. The zebras are in a green field with grass, providing them with an ideal environment to graze and eat while staying near their source of shelter. This shows that the zebras' behavior is influenced by the conditions and available resources in their surroundings. It also highlights that these animals adopt strategies to adapt to the fluctuating conditions of their environment, such as cooperation and seeking shelter, to survive and thrive in their natural habitat."
}
]
}
```
在开始微调之前,需要安装相关的依赖。请注意,你还需要安装好 [basic_demo](../basic_demo/requirements.txt) 中的依赖。
```bash
pip install -r requirements.txt
```
**注意**: `mpi4py` 可能需要安装别的 Linux 依赖包。请根据您的系统环境自行安装。
2. 运行微调程序
我们提供了使用单机多卡(包含单卡)的微调脚本 `peft_lora.py`。您可以通过运行以下命令来启动微调。
```bash
deepspeed peft_lora.py --ds_config ds_config.yaml
```
下图展现了微调过程中的显存占用情况
参数信息:
+ `max_input_len`: 512
+ `max_output_len`: 512
+ `batch_size_per_gpus`: 1
+ `lora_target`: vision_expert_query_key_value
显存占用情况:
```shell
+-------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================|
| 0 N/A N/A 704914 C python 72442MiB |
| 1 N/A N/A 704915 C python 72538MiB |
| 2 N/A N/A 704916 C python 72538MiB |
| 3 N/A N/A 704917 C python 72538MiB |
| 4 N/A N/A 704918 C python 72538MiB |
| 5 N/A N/A 704919 C python 72538MiB |
| 6 N/A N/A 704920 C python 72538MiB |
| 7 N/A N/A 704921 C python 72442MiB |
+-------------------------------------------------------------+
```
在代码运行中,Loss数据会被 tensorboard记录,方便可视化查看Loss收敛情况。
```shell
tensorboard --logdir=output
```
**注意**: 我们强烈推荐您使用 `BF16` 格式进行微调,以避免出现 Loss 为 `NaN`的问题。
3. 推理微调后的模型
运行 `peft_infer.py`,你可以使用微调后的模型生成文本。您需要按照代码中的配置要求,配置微调后的模型地址。然后运行:
```shell
python peft_infer.py
```
即可使用微调的模型进行推理。
train_micro_batch_size_per_gpu: 1
gradient_accumulation_steps: 1
steps_per_print: 50
gradient_clipping: 1.0
zero_optimization:
stage: 2
contiguous_gradients: false
overlap_comm: true
reduce_scatter: true
reduce_bucket_size: 1000000000
allgather_bucket_size: 100000000
load_from_fp32_weights: false
round_robin_gradients: false
offload_optimizer:
device: cpu
pin_memory: true
zero_allow_untested_optimizer: true
bf16:
enabled: true
activation_checkpointing:
partition_activations: false
contiguous_memory_optimization: false
cpu_checkpointing: false
wall_clock_breakdown: true
"""
This is a simple chat demo using CogVLM2 PEFT finetune model in CIL.
Just replace the model loading part with the PEFT model loading code.
"""
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
## Loading PEFT model
MODEL_PATH = "THUDM/cogvlm2-llama3-chat-19B" # The path to the base model (read tokenizer only)
PEFT_MODEL_PATH = "/output/checkpoint_epoch_0_step_50" # The path to the PEFT model
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
PEFT_MODEL_PATH,
torch_dtype=TORCH_TYPE,
trust_remote_code=True,
device_map="auto",
).to(DEVICE).eval()
## The following code is the same as the one in basic_demo/cli_demo.py
text_only_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"
while True:
image_path = input("image path >>>>> ")
if image_path == '':
print('You did not enter image path, the following will be a plain text conversation.')
image = None
text_only_first_query = True
else:
image = Image.open(image_path).convert('RGB')
history = []
while True:
query = input("Human:")
if query == "clear":
break
if image is None:
if text_only_first_query:
query = text_only_template.format(query)
text_only_first_query = False
else:
old_prompt = ''
for _, (old_query, response) in enumerate(history):
old_prompt += old_query + " " + response + "\n"
query = old_prompt + "USER: {} ASSISTANT:".format(query)
if image is None:
input_by_model = model.build_conversation_input_ids(
tokenizer,
query=query,
history=history,
template_version='chat'
)
else:
input_by_model = model.build_conversation_input_ids(
tokenizer,
query=query,
history=history,
images=[image],
template_version='chat'
)
inputs = {
'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(DEVICE),
'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(DEVICE),
'images': [[input_by_model['images'][0].to(DEVICE).to(TORCH_TYPE)]] if image is not None else None,
}
# add any transformers params here.
gen_kwargs = {
"max_new_tokens": 2048,
"pad_token_id": 128002, # avoid warning of llama3
}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nCogVLM2:", response)
history.append((query, response))
import argparse
import gc
import json
import os
import random
import threading
import yaml
from PIL import Image
import psutil
import torch
from accelerate import Accelerator, DeepSpeedPlugin
from accelerate.utils import HfDeepSpeedConfig
from torch.utils.data import Dataset, DataLoader, random_split
from tqdm import tqdm
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
get_linear_schedule_with_warmup
)
from torch.utils.tensorboard import SummaryWriter
from peft import get_peft_model, LoraConfig, TaskType
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ConversationDataset(Dataset):
def __init__(self,
root_dir,
tokenizer,
model,
torch_type,
device='cuda',
input_length=1024,
output_length=1024
):
self.root_dir = root_dir
self.tokenizer = tokenizer
self.model = model
self.image_dir = os.path.join(root_dir, 'images')
self.label_dir = os.path.join(root_dir,
'labels_en') # can be change to labels_en or labels_zh in SFT-311K dataset
self.filenames = os.listdir(self.image_dir)
self.input_length = input_length
self.output_length = output_length
self.device = device
self.torch_type = torch_type
self.padding_len = 2303
self.max_length = self.input_length + self.output_length + self.padding_len
def __len__(self):
return len(self.filenames)
@staticmethod
def custom_collate_fn(batch):
batched_data = {}
for key in batch[0].keys():
if isinstance(batch[0][key], list):
batched_data[key] = [batch_item[key] for batch_item in batch]
elif isinstance(batch[0][key], torch.Tensor):
batched_data[key] = torch.stack([item[key] for item in batch])
else:
raise ValueError("Unsupported datatype in custom collate_fn")
return batched_data
def __getitem__(self, idx):
img_name = os.path.join(self.image_dir, self.filenames[idx])
label_name = os.path.join(self.label_dir, self.filenames[idx].replace('.jpg', '.json'))
image = Image.open(img_name).convert('RGB')
with open(label_name, 'r') as f:
label_data = json.load(f)
num_rounds = len(label_data["conversations"]) // 2
sampled_round_id = random.randint(0, num_rounds - 1)
history = [(label_data["conversations"][(sampled_round_id - 1) * 2]["content"],
label_data["conversations"][(sampled_round_id - 1) * 2 + 1]["content"])] if (
sampled_round_id > 0 and random.random() > 0.5) else None
query = label_data["conversations"][sampled_round_id * 2]["content"]
response = label_data["conversations"][sampled_round_id * 2 + 1]["content"]
input_data = self.model.build_conversation_input_ids(
tokenizer=self.tokenizer,
query=query,
history=history,
images=[image],
answer=response
)
def pad_to_len(unpadded_tensor, pad_to_length, pad_value=0):
current_length = len(unpadded_tensor)
if current_length >= pad_to_length:
return unpadded_tensor[:pad_to_length]
return torch.cat(
(unpadded_tensor,
torch.full([pad_to_length - current_length],
fill_value=pad_value,
dtype=unpadded_tensor.dtype,
device=unpadded_tensor.device)), dim=0)
input_data['input_ids'] = pad_to_len(
input_data['input_ids'],
self.max_length,
pad_value=128002,
)
input_data['attention_mask'] = pad_to_len(
input_data['attention_mask'],
self.max_length,
pad_value=0
)
input_data['token_type_ids'] = pad_to_len(
input_data['token_type_ids'],
self.max_length,
pad_value=0
)
input_data['labels'] = pad_to_len(
input_data['labels'],
self.max_length,
pad_value=-100
)
for data_key in input_data:
if data_key in ['images']:
input_data[data_key] = [data.to(self.device).to(self.torch_type) for data in
input_data[data_key]]
else:
input_data[data_key] = input_data[data_key].to(self.device)
return input_data
def b2mb(x):
return int(x / 2 ** 20)
class TorchTracemalloc:
def __enter__(self):
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated()
self.begin = torch.cuda.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
self.peak_monitoring = True
peak_monitor_thread = threading.Thread(target=self.peak_monitor_func)
peak_monitor_thread.daemon = True
peak_monitor_thread.start()
return self
def cpu_mem_used(self):
return self.process.memory_info().rss
def peak_monitor_func(self):
self.cpu_peak = -1
while True:
self.cpu_peak = max(self.cpu_mem_used(), self.cpu_peak)
if not self.peak_monitoring:
break
def __exit__(self, *exc):
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)
self.cpu_end = self.cpu_mem_used()
self.cpu_used = b2mb(self.cpu_end - self.cpu_begin)
self.cpu_peaked = b2mb(self.cpu_peak - self.cpu_begin)
def main():
parser = argparse.ArgumentParser(description="Finetune a CogVLM model with LoRA")
parser.add_argument("--lr", type=float, default=1e-7, help="Learning rate")
parser.add_argument("--num_epochs", type=int, default=5, help="Number of epochs")
parser.add_argument("--batch_size", type=int, default=2, help="Batch size")
parser.add_argument("--torch_type", type=str, default="torch.bfloat16", help="Torch type")
parser.add_argument("--save_step", type=int, default=100, help="Steps between checkpoints")
parser.add_argument("--train_dataset_rate", type=float, default=0.8,
help="Proportion of dataset to use for training")
parser.add_argument("--local_rank", type=int, default=-1, help="Local rank for distributed training")
parser.add_argument("--lora_rank", type=int, default=8, help="Rank parameter for LoRA")
parser.add_argument("--lora_alpha", type=int, default=32, help="Alpha parameter for LoRA")
parser.add_argument("--lora_target", type=str, default=["vision_expert_query_key_value"],
help="Finetune Target for LoRA") # you can change the target to other modules such as "language_expert_query_key_value"
parser.add_argument("--lora_dropout", type=float, default=0.1, help="Dropout rate for LoRA")
parser.add_argument("--warmup_steps", type=int, default=0,
help="Number of warmup steps for learning rate scheduler")
parser.add_argument("--max_input_len", type=int, default=128, help="Maximum input length")
parser.add_argument("--max_output_len", type=int, default=128, help="Maximum output length")
parser.add_argument("--model_path", type=str,
default="THUDM/cogvlm2-llama3-chat-19B",
help="Path to the pretrained model")
parser.add_argument("--dataset_path", type=str,
default="CogVLM-SFT-311K/llava_instruction_multi_conversations_formate",
help="Path to the conversation dataset")
parser.add_argument("--save_path", type=str, default="output",
help="Path to save the finetuned model, must be a exit directory")
parser.add_argument("--ds_config", type=str, default="ds_config.yaml",
help="DeepSpeed configuration file path")
args = parser.parse_args()
args.torch_type = eval(args.torch_type)
with open(args.ds_config) as f:
ds_config = yaml.safe_load(f)
hf_ds_config = HfDeepSpeedConfig(ds_config)
ds_plugin = DeepSpeedPlugin(hf_ds_config=hf_ds_config)
accelerator = Accelerator(deepspeed_plugin=ds_plugin)
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(args.model_path, torch_dtype=args.torch_type, trust_remote_code=True)
if len(tokenizer) != model.get_input_embeddings().weight.size(0):
model.resize_token_embeddings(len(tokenizer))
dataset = ConversationDataset(
root_dir=args.dataset_path,
tokenizer=tokenizer,
model=model,
torch_type=args.torch_type,
input_length=args.max_input_len,
output_length=args.max_output_len
)
train_size = int(args.train_dataset_rate * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
train_dataloader = DataLoader(
train_dataset,
batch_size=args.batch_size,
shuffle=True,
collate_fn=dataset.custom_collate_fn,
)
eval_dataloader = DataLoader(
val_dataset,
batch_size=args.batch_size,
shuffle=True,
collate_fn=dataset.custom_collate_fn,
)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=args.lora_rank,
target_modules=args.lora_target,
lora_alpha=args.lora_alpha,
lora_dropout=args.lora_dropout,
)
model = get_peft_model(model, peft_config)
optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=args.warmup_steps,
num_training_steps=(len(train_dataloader) * args.num_epochs),
)
model, train_dataloader, eval_dataloader, optimizer, lr_scheduler = accelerator.prepare(
model, train_dataloader, eval_dataloader, optimizer, lr_scheduler
)
logger.info("Preparation done. Starting training...")
writer = SummaryWriter(log_dir=args.save_path)
for epoch in range(args.num_epochs):
model.train()
total_loss = 0.0
for step, batch in enumerate(tqdm(train_dataloader)):
outputs = model(
input_ids=batch['input_ids'],
token_type_ids=batch['token_type_ids'],
attention_mask=batch['attention_mask'],
images=batch['images'],
labels=batch['labels']
)
loss = outputs.loss
total_loss += loss.detach().float()
accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
if (step + 1) % args.save_step == 0:
print(f"Epoch {epoch}, Step {step + 1}, Loss {loss.item()}")
checkpoint_path = os.path.join(args.save_path, f'checkpoint_epoch_{epoch}_step_{step + 1}')
model.save_pretrained(
save_directory=checkpoint_path,
safe_serialization=True
)
writer.add_scalar('Train/Loss', loss.item(), epoch * len(train_dataloader) + step)
total_loss = accelerator.gather(total_loss)
avg_loss = total_loss.mean().item() / len(train_dataloader)
train_ppl = torch.exp(torch.tensor(avg_loss))
writer.add_scalar('Train/Epoch_Loss', avg_loss, epoch)
writer.add_scalar('Train/Perplexity', train_ppl, epoch)
accelerator.print(f"Epoch {epoch}: Average Loss {avg_loss:.4f}, Perplexity {train_ppl:.4f}")
model.eval()
eval_loss = 0.0
for _, batch in enumerate(tqdm(eval_dataloader)):
inputs = {
'input_ids': batch['input_ids'],
'token_type_ids': batch['token_type_ids'],
'attention_mask': batch['attention_mask'],
'images': batch['images']
}
labels = batch['labels'].to(accelerator.device)
with torch.no_grad():
outputs = accelerator.unwrap_model(model)(
input_ids=inputs['input_ids'],
token_type_ids=inputs['token_type_ids'],
attention_mask=inputs['attention_mask'],
images=inputs['images'],
labels=labels
)
loss = outputs.loss
eval_loss += loss.detach().float()
eval_loss = accelerator.gather(eval_loss)
avg_eval_loss = eval_loss.mean().item()
writer.add_scalar('Eval/Perplexity', torch.exp(torch.tensor(avg_eval_loss)), epoch)
writer.add_scalar('Eval/Epoch_Loss', avg_eval_loss, epoch)
checkpoint_path = os.path.join(args.save_path, 'final_model')
model.save_pretrained(
save_directory=checkpoint_path,
safe_serialization=True
)
if __name__ == "__main__":
main()
peft>=0.10.0
deepspeed>=0.14.2
mpi4py>=3.1.4
tensorboard>=2.16.2
\ No newline at end of file
icon.png

61 KB

# 模型唯一标识
modelCode = 857
# 模型名称
modelName=CogVLM2_pytorch
# 模型描述
modelDescription=CogVLM2是一个开源的多模态大型语言模型,旨在缩小开源模型与商业专有模型在多模态理解方面的能力差距,可用于OCR、视频理解、文档问答。
# 应用场景
appScenario=推理,OCR,金融,教育,交通,政府
# 框架类型
frameType=pytorch
<div align="center">
<img src=wechat.jpg width="60%"/>
<p> 扫码关注公众号,加入「CogVLM交流群」 </p>
<p> Scan the QR code to follow the official account and join the "CogVLM Discussion Group" </p>
</div>
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 28.2.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
viewBox="0 0 841.89 368.6" style="enable-background:new 0 0 841.89 368.6;" xml:space="preserve">
<style type="text/css">
.st0{fill:#0039C6;}
</style>
<g id="图层_2">
<g>
<g>
<g>
<path class="st0" d="M248.93,129.06c-5.91,0-10.73,4.81-10.73,10.73c0,5.91,4.81,10.73,10.73,10.73
c5.91,0,10.73-4.81,10.73-10.73C259.66,133.87,254.85,129.06,248.93,129.06z M253.8,145.35c-3,0-5.44-2.44-5.44-5.44
c0-3,2.44-5.44,5.44-5.44c3,0,5.44,2.44,5.44,5.44C259.24,142.91,256.8,145.35,253.8,145.35z"/>
</g>
<g>
<path class="st0" d="M254.2,136.94c-1.57,0-2.85,1.28-2.85,2.85c0,1.57,1.28,2.85,2.85,2.85c1.57,0,2.85-1.28,2.85-2.85
C257.05,138.21,255.77,136.94,254.2,136.94z"/>
</g>
</g>
<g>
<path class="st0" d="M208.68,231.25c0.17-0.19,0.04-0.49-0.22-0.5c-2.58-0.04-12.29-0.76-17.26-8.87
c-2.87-4.67-3.34-10.48-1.21-15.56c0.15-0.36,0.31-0.71,0.48-1.06c0.04-0.09,0.04-0.19-0.01-0.27l-8.24-14.84
c-0.07-0.13-0.22-0.17-0.36-0.14c-16.55,3.25-26.19,17.25-23.27,31.06c0.79,3.75,2.34,7.08,4.74,10.14
c6.13,7.79,15.24,10.33,21.93,10.13C199.11,240.92,206.82,233.34,208.68,231.25z"/>
<path class="st0" d="M132,154.09c5.91-7.11,12.74-13.01,20.39-17.65l-13.3-23.96c-0.85-1.53-2.46-2.47-4.21-2.45l-39.43,0.33
c-1.93,0.02-3.14,2.09-2.2,3.78l29.7,53.49C126.3,161.42,129.69,156.87,132,154.09z"/>
<path class="st0" d="M236.57,187.09c-3.46,0.07-6.76-0.08-8.56,0.22c-1.5,0.25-5.39,0.99-9.35,4.7c-0.17,0.16-0.1,0.45,0.13,0.51
c2.7,0.71,5.19,1.94,7.61,3.41c5.17,3.14,9.88,11.04,11.34,14.76c0.09,0.24,0.42,0.26,0.54,0.03l14.1-26.81
c0.13-0.25-0.12-0.52-0.38-0.42C245.4,186.18,239.91,187.02,236.57,187.09z"/>
<path class="st0" d="M287.13,109.87l-37.27,0.3c-1.74,0.01-3.34,0.98-4.16,2.51l-1.03,2.21c2.98-0.07,19.89,1.25,34.09,18.93
l10.64-20.23C290.3,111.9,289.06,109.86,287.13,109.87z"/>
<path class="st0" d="M181.85,273.71l6.91,12.44c0.94,1.69,3.34,1.76,4.37,0.13l10.74-16.92
C196.95,271.98,189.37,273.52,181.85,273.71z"/>
</g>
<path class="st0" d="M275.98,141.57c-11.98-19.96-28.93-19.62-28.93-19.62c-7.8-1.52-11.78-10.35-29.73-14.73
c-16.78-4.1-26.14,0.92-19.42,3.7c10.84,4.49,17.02,22.38,20.19,35.86c4.05,17.22,12.86,17.54,22.61,16.49
c7.58-0.81,22.75-8.68,30.51-10.3C279.68,151.2,282.32,152.14,275.98,141.57z M248.33,154.19c-7.95,0-14.4-6.45-14.4-14.4
c0-7.95,6.45-14.4,14.4-14.4c7.95,0,14.4,6.45,14.4,14.4C262.73,147.74,256.28,154.19,248.33,154.19z"/>
<path class="st0" d="M227.8,229.49"/>
<g>
<path class="st0" d="M197.52,207.04c-2.14,5.11,0.26,12.88,10.79,13.58c0.65,0,0.98-0.44,0.98-0.98c0-0.54-0.47-0.8-0.98-0.98
c-0.51-0.18-5.36-1.85-4.3-7.53c0.35-1.89,2.04-3.59,4.11-4.37c-1.17-2.17-1.18-5.1,0.48-7.91
C205.79,198.84,200.28,200.43,197.52,207.04z"/>
<path class="st0" d="M230.96,217.37c0-5.61-3.7-11.56-6.6-13.45c-2.52-0.87-6.69,0.5-9.04,4c2.76,1.8,4.53,4.69,4.83,12.13
c0.38,9.63-6.91,17.91-16.68,22.63C217.93,239.9,230.96,228,230.96,217.37z"/>
<path class="st0" d="M143.77,233.5c-27.87-35.61-11.16-83.24,29.46-99.36c-50.57,17.04-65.67,72.87-37.27,107.96
c18.85,23.29,51.28,24.08,68.65,18.04C183.53,258.39,160.91,255.4,143.77,233.5z"/>
</g>
<path class="st0" d="M216.65,183.11c0.24-0.1,0.24-0.44,0-0.55c-0.49-0.23-1.44-0.41-3.18,0.03c-3.37,0.84-11.64,2.17-10.99-1.85
c0.46-2.86,5.04-3.77,8.39-9.25c1.66-2.71-0.48-7.17-4.83-7.17c-3.35,0-6.79,1.52-10.54,5.49c-5.24,5.54-6.28,10.74-4.25,14.56
c1.92,3.63,6.43,5.19,13.96,3.94C210.35,187.47,211.93,185.19,216.65,183.11z"/>
<path class="st0" d="M213.33,198.84c0,0-4.04,1.45-3.58,7.66c1.5-0.38,1.96-0.23,4,0.75c0.94-2.77,3.34-5.1,6.89-5.1
c0.64,0,1.27,0.07,1.89,0.19C220.14,200.56,217.67,199.06,213.33,198.84z"/>
<path class="st0" d="M242.58,175.96c-20.04,4.22-20.33-12.96-35.09-16.28c-19.87-4.46-45.31,7.42-56.48,29.07
c-4.54,8.8-8.82,26.05,2.32,40.74c-7-19.04,3.56-33.05,10.37-38.54c6.41-5.16,14.25-8.42,21.99-10.39
c0.13-0.03,0.22-0.15,0.22-0.28c0.13-4.18,2.26-8.65,6.29-12.91c4.27-4.52,8.65-6.71,13.37-6.71c3.46,0,6.5,1.76,8.15,4.7
c1.52,2.71,1.52,5.94,0.01,8.41c-0.9,1.47-1.87,2.67-2.83,3.66c-0.18,0.18-0.05,0.49,0.2,0.5c9.28,0.58,16.91,2.84,23.72,2.32
c25.69-1.96,34.84-15.65,38.14-18.17C275.97,159.82,253.41,173.68,242.58,175.96z"/>
<path class="st0" d="M217.22,232.36c-7.51,7.51-19.81,18.1-43.6,13.65c-17.26-3.23-30.49-27.15-26.64-56.01
c2.06-15.44,14.03-29.84,25.26-34.49c24.56-10.16,37.47,0.67,43.44,7.36c4.23,4.74,9.07,14.53,24.71,14.4
c11.94-0.1,32.62-14.15,38.36-19.45c1.96-1.81-0.18-2.81-3.43-2.52c-12.41,1.11-23.91,14.36-40.8,13.64
c-8.34-0.35-18.84-6.08-22.54-18.44c-4.45-14.86-4.54-17.96-9.65-18.84c-5.35-0.92-13.51-1.42-25.31,2.26
c-15.27,4.78-39.22,28.17-40.75,56.07c-2.65,48.25,18.91,67.59,52.15,71.69c3.76,0.46,11.99-0.54,15.65-1.33
c5.13-2.19,9.52-4.96,12.83-7.93c8.29-7.44,13.33-15.72,14.52-27.68c0.4-4.01-0.57-9.34-1.11-11.68
C230.12,215.57,225.21,224.37,217.22,232.36z"/>
</g>
<g>
<g>
<path class="st0" d="M313.84,231.39c-5.58-3.09-9.92-7.49-13.01-13.18c-3.09-5.69-4.64-12.28-4.64-19.76V166.1
c0-7.39,1.54-13.9,4.64-19.54c3.09-5.64,7.43-10.01,13.01-13.1c5.58-3.09,12.06-4.64,19.45-4.64c7.38,0,13.86,1.47,19.45,4.42
c5.58,2.95,9.92,7.09,13.01,12.43c3.09,5.34,4.64,11.51,4.64,18.49c0,0.5-0.16,0.9-0.48,1.2c-0.32,0.3-0.72,0.45-1.18,0.45
l-19.65,1.35c-1.11,0-1.66-0.55-1.66-1.65c0-4.69-1.29-8.43-3.88-11.23c-2.58-2.79-6-4.19-10.24-4.19
c-4.25,0-7.66,1.42-10.24,4.27c-2.58,2.85-3.88,6.56-3.88,11.16v33.99c0,4.59,1.29,8.29,3.88,11.08c2.58,2.8,6,4.19,10.24,4.19
c4.24,0,7.66-1.4,10.24-4.19c2.58-2.79,3.88-6.49,3.88-11.08c0-1.1,0.55-1.65,1.66-1.65l19.65,1.05c0.46,0,0.85,0.15,1.18,0.45
c0.32,0.3,0.48,0.65,0.48,1.05c0,7.09-1.55,13.33-4.64,18.72c-3.09,5.39-7.43,9.56-13.01,12.5c-5.58,2.95-12.07,4.42-19.45,4.42
C325.9,236.03,319.42,234.48,313.84,231.39z"/>
<path class="st0" d="M388.23,228.84c-5.81-4.79-9.69-11.28-11.63-19.47c-1.11-4.09-1.66-8.58-1.66-13.48
c0-5.49,0.6-10.33,1.8-14.52c2.12-7.88,6.07-14.05,11.83-18.49c5.77-4.44,12.76-6.66,20.97-6.66c8.12,0,14.99,2.22,20.62,6.66
c5.63,4.44,9.55,10.56,11.76,18.34c1.29,4.49,1.94,9.28,1.94,14.37c0,4.59-0.51,8.98-1.52,13.18
c-1.94,8.39-5.81,15.02-11.63,19.92c-5.81,4.89-12.92,7.34-21.32,7.34C401.11,236.03,394.05,233.63,388.23,228.84z
M416.05,211.99c1.75-1.85,3.05-4.37,3.88-7.56c0.55-2.6,0.83-5.44,0.83-8.54c0-2.99-0.32-5.89-0.97-8.68
c-0.74-3.09-1.99-5.49-3.74-7.19c-1.75-1.7-3.97-2.55-6.64-2.55c-5.35,0-8.86,3.24-10.52,9.73c-0.55,2.4-0.83,5.29-0.83,8.68
c0,3.1,0.28,5.94,0.83,8.54c0.74,3.2,2.01,5.72,3.81,7.56c1.8,1.85,4.04,2.77,6.71,2.77
C412.09,214.76,414.3,213.84,416.05,211.99z"/>
<path class="st0" d="M493.63,157.94c0.32-0.35,0.71-0.52,1.18-0.52h19.65c0.46,0,0.85,0.18,1.18,0.52
c0.32,0.35,0.48,0.77,0.48,1.27v66.93c0,13.68-3.65,23.46-10.93,29.35c-7.29,5.89-16.52,8.83-27.68,8.83
c-4.15,0-8.54-0.4-13.15-1.2c-0.92-0.1-1.38-0.75-1.38-1.95l0.69-18.57c0-0.7,0.18-1.17,0.55-1.42c0.37-0.25,0.83-0.28,1.38-0.07
c3.78,0.9,7.29,1.35,10.52,1.35c5.26,0,9.41-1.3,12.46-3.89c3.04-2.6,4.57-6.69,4.57-12.28l-0.97,1.05
c-3.32,3.59-8.12,5.39-14.4,5.39c-6.09,0-11.67-1.45-16.75-4.34c-5.08-2.89-8.72-7.99-10.93-15.27
c-1.48-4.79-2.21-10.68-2.21-17.67c0-7.69,0.88-13.97,2.63-18.87c2.12-6.19,5.51-11.13,10.17-14.82
c4.66-3.69,10.08-5.54,16.26-5.54c6.64,0,11.76,2.2,15.36,6.59c0.18,0.2,0.37,0.28,0.55,0.22c0.18-0.05,0.28-0.22,0.28-0.52
v-3.29C493.15,158.72,493.31,158.29,493.63,157.94z M493.15,195.3c0-2.7-0.09-4.82-0.28-6.36c-0.19-1.55-0.55-3.02-1.11-4.42
c-0.74-2.19-1.92-3.92-3.53-5.17c-1.62-1.25-3.58-1.87-5.88-1.87c-4.34,0-7.43,2.35-9.27,7.04c-1.38,2.8-2.08,6.49-2.08,11.08
c0,4.89,0.6,8.49,1.8,10.78c0.83,2.1,2.08,3.79,3.74,5.09c1.66,1.3,3.64,1.95,5.95,1.95c4.71,0,7.84-2.29,9.41-6.89
C492.73,204.23,493.15,200.49,493.15,195.3z"/>
<path class="st0" d="M549.07,233.33l-27.96-101.22l-0.14-0.6c0-1,0.51-1.5,1.52-1.5h21.18c1.01,0,1.66,0.5,1.94,1.5l15.92,68.13
c0.09,0.3,0.23,0.45,0.41,0.45c0.18,0,0.32-0.15,0.42-0.45l15.64-68.13c0.28-1,0.92-1.5,1.94-1.5h20.76c0.55,0,0.97,0.2,1.25,0.6
c0.28,0.4,0.32,0.9,0.14,1.5L573.7,233.33c-0.28,1-0.88,1.5-1.8,1.5h-21.04C549.94,234.83,549.34,234.33,549.07,233.33z"/>
<path class="st0" d="M608.51,234.3c-0.32-0.35-0.48-0.77-0.48-1.27V131.81c0-0.5,0.16-0.92,0.48-1.27
c0.32-0.35,0.71-0.52,1.18-0.52h19.65c0.46,0,0.85,0.18,1.18,0.52c0.32,0.35,0.48,0.77,0.48,1.27v81.01
c0,0.5,0.23,0.75,0.69,0.75h44.15c0.46,0,0.85,0.18,1.18,0.52c0.32,0.35,0.48,0.77,0.48,1.27v17.67c0,0.5-0.16,0.92-0.48,1.27
c-0.32,0.35-0.72,0.52-1.18,0.52h-66.16C609.23,234.83,608.83,234.66,608.51,234.3z"/>
<path class="st0" d="M747.27,130.02h19.52c0.46,0,0.85,0.18,1.18,0.52c0.32,0.35,0.48,0.77,0.48,1.27v101.22
c0,0.5-0.16,0.92-0.48,1.27c-0.32,0.35-0.72,0.52-1.18,0.52h-19.65c-0.46,0-0.85-0.17-1.18-0.52c-0.32-0.35-0.48-0.77-0.48-1.27
v-60.19c0-0.4-0.09-0.6-0.28-0.6c-0.19,0-0.37,0.15-0.55,0.45l-11.9,20.66c-0.37,0.8-1.02,1.2-1.94,1.2h-9.83
c-0.92,0-1.57-0.4-1.94-1.2l-12.04-20.81c-0.19-0.3-0.37-0.45-0.55-0.45c-0.19,0-0.28,0.2-0.28,0.6v60.34
c0,0.5-0.16,0.92-0.48,1.27c-0.32,0.35-0.72,0.52-1.18,0.52h-19.65c-0.46,0-0.85-0.17-1.18-0.52c-0.32-0.35-0.48-0.77-0.48-1.27
V131.81c0-0.5,0.16-0.92,0.48-1.27c0.32-0.35,0.71-0.52,1.18-0.52h19.52c0.83,0,1.47,0.4,1.94,1.2l19.24,32.64
c0.28,0.6,0.55,0.6,0.83,0l18.96-32.64C745.7,130.42,746.34,130.02,747.27,130.02z"/>
</g>
</g>
</g>
<g id="图层_1">
</g>
</svg>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment