Commit afe180a6 authored by wanglch's avatar wanglch
Browse files

Initial commit

parents
Pipeline #1006 canceled with stages
# FinGPT Contribution Guidelines 🚀
Welcome to the FinGPT project! We are thrilled to have you here 🌟. Your contributions are instrumental in shaping the intersection of finance and AI, making it even more amazing. 📈✨ Let's embark on this journey together.
## Code of Conduct 🤝
Before diving in, please take a moment to review our Code of Conduct. It sets the tone for our community and emphasizes the importance of respect and inclusivity. [Read the Code of Conduct](LICENSE.md).
## Contribution Types 🦠🚀📚
### Bug Reports 🐞
If you encounter any bugs during your journey, don't fret! We have the Bug Busters ready to help. To report a bug, follow these steps:
1. Check if the bug has already been reported in [GitHub Issues](https://github.com/AI4Finance-Foundation/FinGPT/issues).
2. If it's a new bug, open a new issue with a concise description and provide detailed, step-by-step instructions to reproduce it.
### Feature Requests 💡
Do you have visionary ideas that could elevate FinGPT? Share them with us! When submitting a feature request, be sure to include:
1. A clear and vivid description of the feature you envision.
2. Discuss the impact and potential benefits.
### Documentation 📖
For those with a penchant for words and an eye for detail, consider contributing to our documentation. You can make the documentation more enlightening for everyone. 🧙📜
### Code Contributions 💻
Calling all AI heroes and wizards! You are the secret sauce behind the FinGPT project. To contribute code and save the financial world:
1. **Fork the Repository**: Click the "Fork" button on the top right of the repository's page. This creates your own copy of the project.
2. **Clone your Fork**: In your terminal, use the following command to clone your fork to your local machine:
```bash
git clone https://github.com/YourUsername/FinGPT.git
```
3. **Create a New Branch**: Make a new branch for your adventures. This helps keep the main codebase clean:
```bash
git checkout -b your-feature-branch
```
4. **Work Your Magic**: Implement your code or changes.
5. **Commit and Push**: Use these commands to commit your changes and push them to your fork:
```bash
git commit -m "Your commit message"
git push origin your-feature-branch
```
6. **Create a Pull Request**: Go to the original FinGPT repository and click "New Pull Request." Select your branch, write a description, and submit.
## Seeking Assistance ❓🙋‍♀️
If you find yourself stuck or have questions, remember that our support team is your sidekick. Don't hesitate to reach out. We are here to guide you through the process and provide any necessary assistance.
## Getting Started 🚀🚀
Are you ready to make a mark on the FinGPT project? Grab your cape and join us in our mission to make finance and AI even more incredible. Your contributions are the magic that fuels our journey.
🔗 [FinGPT GitHub Repository](https://github.com/AI4Finance-Foundation/FinGPT)
### May your contributions be as amazing as you are! 🌌🚀
\ No newline at end of file
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
---
license: apache-2.0
language:
- zh
tags:
- finance
---
This repository contains the DISC-FinLLM, version of Baichuan-13B-Chat as the base model.
<div align="center">
[Demo](https://finllm.fudan-disc.com) | [Tech Report](https://arxiv.org/abs/2309.11325)
</div>
**Please note that due to the ongoing development of the project, the model weights in this repository may differ from those in our currently deployed demo.**
DISC-FinLLM is a large model in the financial field specifically designed to provide users with professional, intelligent, and comprehensive **financial consulting services** in financial scenarios. It is developed and open sourced by [Fudan University Data Intelligence and Social Computing Laboratory (Fudan-DISC)](http://fudan-disc.com). It is a multi-expert smart financial system composed of four modules for different financial scenarios: financial consulting, financial text analysis, financial calculation, and financial knowledge retrieval and question answering. These modules showed clear advantages in four evaluations including financial NLP tasks, human test questions, data analysis and current affairs analysis, proving that DISC-FinLLM can provide strong support for a wide range of financial fields. DISC-FinLLM can help in different application scenarios and can be used to implement different functions:
* **Financial Consultation:** This module can start multiple rounds of dialogue with users on financial topics in the Chinese financial context, or explain relevant knowledge of financial majors to users. It is composed of the financial consulting instructions part of the data set.
* **Financial Text Analysis:** This module can help users complete NLP tasks such as information extraction, sentiment analysis, text classification, and text generation on financial texts. It is trained by the financial task instructions in the data set.
* **Financial Calculation:** This module can help users complete tasks related to mathematical calculations. In addition to basic calculations such as interest rates and growth rates, it also supports statistical analysis and includes the Black-Scholes option pricing model and the EDF expected default probability model. Financial model calculations included. This module is partially trained from the financial computing instructions in the data set.
* **Financial Knowledge Retrieval Q&A:** This module can provide users with investment advice, current affairs analysis, and policy interpretation based on financial news, research reports, and related policy documents. It is partially trained from the retrieval-enhanced instructions in the dataset.
Check our [HOME](https://github.com/FudanDISC/DISC-FinLLM) for more information.
# DISC-Fin-SFT Dataset
DISC-FinLLM is a large financial model based on the high-quality financial data set DISC-Fin-SFT. We construct and fine-tuned the LoRA instruction on the general-domain Chinese large model Baichuan-13B-Chat. DISC-Fin-SFT contains a total of about 250,000 pieces of data, divided into four sub-data sets, which are financial consulting instructions, financial task instructions, financial computing instructions, and retrieval-enhanced instructions.
| Dataset | Samples | Input Length | Output Length |
|----------------:|----------------:|------------------------------------------------------------:|-----------------------------------------------------------:|
| Financial Consulting Instructions | 63k | 26 | 369 |
| Financial Task Instructions | 110k | 676 | 35 |
| Financial Computing Instructions | 57k | 73 | 190 |
| Retrieval-enhanced Instructions | 20k | 1031 | 521 |
| DISC-Fin-SFT | 246k | 351 | 198 |
# Using through hugging face transformers
```python
>>>import torch
>>>>>>from transformers import AutoModelForCausalLM, AutoTokenizer
>>>from transformers.generation.utils import GenerationConfig
>>>tokenizer = AutoTokenizer.from_pretrained("Go4miii/DISC-FinLLM", use_fast=False, trust_remote_code=True)
>>>model = AutoModelForCausalLM.from_pretrained("Go4miii/DISC-FinLLM", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>>model.generation_config = GenerationConfig.from_pretrained("Go4miii/DISC-FinLLM")
>>>messages = []
>>>messages.append({"role": "user", "content": "请解释一下什么是银行不良资产?"})
>>>response = model.chat(tokenizer, messages)
>>>print(response)
```
## Disclaimer
DISC-FinLLM has problems and shortcomings that cannot be overcome by current large language models. Although it can provide services in the financial field on many tasks and scenarios, the model should be used for user reference only and cannot replace professional financial analysts and financial experts, we hope that users of DISC-FinLLM will be able to critically evaluate the model. We are not responsible for any problems, risks or adverse consequences arising from the use of DISC-FinLLM.
## Citation
If our project has been helpful for your research and work, please kindly cite our work as follows:
```
@misc{yue2023disclawllm,
title={DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services},
author={Shengbin Yue and Wei Chen and Siyuan Wang and Bingxuan Li and Chenchen Shen and Shujun Liu and Yuxuan Zhou and Yao Xiao and Song Yun and Xuanjing Huang and Zhongyu Wei},
year={2023},
eprint={2309.11325},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## License
The use of the source code in this repository complies with the Apache 2.0 License.
{
"epoch": 2.0,
"train_loss": 0.5708327819994277,
"train_runtime": 105327.89,
"train_samples_per_second": 4.797,
"train_steps_per_second": 0.019
}
\ No newline at end of file
{
"_from_model_config": true,
"_name_or_path": "/DISC-FinLLM/FinLLM",
"architectures": [
"BaichuanForCausalLM"
],
"auto_map": {
"AutoConfig": "configuration_baichuan.BaichuanConfig",
"AutoModel": "modeling_baichuan.BaichuanForCausalLM",
"AutoModelForCausalLM": "modeling_baichuan.BaichuanForCausalLM"
},
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13696,
"model_max_length": 4096,
"model_type": "baichuan",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.31.0",
"use_cache": false,
"vocab_size": 64000
}
# Copyright (c) 2023, Baichuan Intelligent Technology. All rights reserved.
from transformers.configuration_utils import PretrainedConfig
class BaichuanConfig(PretrainedConfig):
model_type = "baichuan"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=64000,
hidden_size=5120,
intermediate_size=13696,
num_hidden_layers=40,
num_attention_heads=40,
hidden_act="silu",
model_max_length=4096,
initializer_range=0.02,
rms_norm_eps=1e-6,
use_cache=True,
pad_token_id=0,
bos_token_id=1,
eos_token_id=2,
tie_word_embeddings=False,
gradient_checkpointing=False,
**kwargs,
):
self.vocab_size = vocab_size
self.model_max_length = model_max_length
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.initializer_range = initializer_range
self.rms_norm_eps = rms_norm_eps
self.use_cache = use_cache
self.gradient_checkpointing = gradient_checkpointing,
super().__init__(
pad_token_id=pad_token_id,
bos_token_id=bos_token_id,
eos_token_id=eos_token_id,
tie_word_embeddings=tie_word_embeddings,
**kwargs,
)
{
"assistant_token_id": 196,
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"max_new_tokens": 2048,
"pad_token_id": 0,
"repetition_penalty": 1.1,
"temperature": 0.3,
"top_k": 5,
"top_p": 0.85,
"transformers_version": "4.31.0",
"user_token_id": 195
}
from typing import List
from queue import Queue
import torch
def build_chat_input(model, tokenizer, messages: List[dict], max_new_tokens: int=0):
def _parse_messages(messages, split_role="user"):
system, rounds = "", []
round = []
for i, message in enumerate(messages):
if message["role"] == "system":
assert i == 0
system = message["content"]
continue
if message["role"] == split_role and round:
rounds.append(round)
round = []
round.append(message)
if round:
rounds.append(round)
return system, rounds
max_new_tokens = max_new_tokens or model.generation_config.max_new_tokens
max_input_tokens = model.config.model_max_length - max_new_tokens
system, rounds = _parse_messages(messages, split_role="user")
system_tokens = tokenizer.encode(system)
max_history_tokens = max_input_tokens - len(system_tokens)
history_tokens = []
for round in rounds[::-1]:
round_tokens = []
for message in round:
if message["role"] == "user":
round_tokens.append(model.generation_config.user_token_id)
else:
round_tokens.append(model.generation_config.assistant_token_id)
round_tokens.extend(tokenizer.encode(message["content"]))
if len(history_tokens) == 0 or len(history_tokens) + len(round_tokens) <= max_history_tokens:
history_tokens = round_tokens + history_tokens # concat left
if len(history_tokens) < max_history_tokens:
continue
break
input_tokens = system_tokens + history_tokens
if messages[-1]["role"] != "assistant":
input_tokens.append(model.generation_config.assistant_token_id)
input_tokens = input_tokens[-max_input_tokens:] # truncate left
return torch.LongTensor([input_tokens]).to(model.device)
class TextIterStreamer:
def __init__(self, tokenizer, skip_prompt=False, skip_special_tokens=False):
self.tokenizer = tokenizer
self.skip_prompt = skip_prompt
self.skip_special_tokens = skip_special_tokens
self.tokens = []
self.text_queue = Queue()
self.next_tokens_are_prompt = True
def put(self, value):
if self.skip_prompt and self.next_tokens_are_prompt:
self.next_tokens_are_prompt = False
else:
if len(value.shape) > 1:
value = value[0]
self.tokens.extend(value.tolist())
self.text_queue.put(
self.tokenizer.decode(self.tokens, skip_special_tokens=self.skip_special_tokens))
def end(self):
self.text_queue.put(None)
def __iter__(self):
return self
def __next__(self):
value = self.text_queue.get()
if value is None:
raise StopIteration()
else:
return value
# Copyright (c) 2023, Baichuan Intelligent Technology. All rights reserved.
import math
from threading import Thread
from typing import List, Optional, Tuple, Union
import torch
import torch.utils.checkpoint
from torch.nn import CrossEntropyLoss
from transformers import PreTrainedModel
from transformers.activations import ACT2FN
from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
from transformers.utils import logging
from transformers.generation.utils import GenerationConfig
from .configuration_baichuan import BaichuanConfig
from .generation_utils import build_chat_input, TextIterStreamer
logger = logging.get_logger(__name__)
def _get_interleave(n):
def _get_interleave_power_of_2(n):
start = (2 ** (-2 ** -(math.log2(n) - 3)))
ratio = start
return [start * ratio ** i for i in range(n)]
if math.log2(n).is_integer():
return _get_interleave_power_of_2(n)
else:
closest_power_of_2 = 2 ** math.floor(math.log2(n))
return _get_interleave_power_of_2(closest_power_of_2) + \
_get_interleave(2 * closest_power_of_2)[0::2][:n - closest_power_of_2]
def _fill_with_neg_inf(t):
"""FP16-compatible function that fills a tensor with -inf."""
return t.float().fill_(float("-inf")).type_as(t)
def _gen_alibi_mask(n_head, max_pos):
"""used in inference only"""
slopes = torch.Tensor(_get_interleave(n_head))
alibi = slopes.unsqueeze(1).unsqueeze(1) * torch.arange(max_pos).unsqueeze(0).unsqueeze(0).expand(
n_head, -1, -1)
alibi = alibi.view(n_head, 1, max_pos)
alibi_mask = torch.triu(
_fill_with_neg_inf(torch.zeros([max_pos, max_pos])), 1
)
alibi_mask = alibi_mask.unsqueeze(0) + alibi
return alibi_mask
def _buffered_future_mask(tensor, maxpos, alibi, attn_heads):
"""used in training only"""
dim = tensor.size(1)
_future_mask = torch.triu(
_fill_with_neg_inf(torch.zeros([maxpos, maxpos])), 1
)
_future_mask = _future_mask.unsqueeze(0) + alibi
_future_mask = _future_mask.to(tensor)
return _future_mask[:tensor.shape[0] * attn_heads, :maxpos, :maxpos]
class RMSNorm(torch.nn.Module):
def __init__(self, hidden_size, epsilon=1e-6):
super().__init__()
self.weight = torch.nn.Parameter(torch.empty(hidden_size))
self.epsilon = epsilon
def forward(self, hidden_states):
variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
hidden_states = hidden_states * torch.rsqrt(variance + self.epsilon)
# convert into half-precision
if self.weight.dtype in [torch.float16, torch.bfloat16]:
hidden_states = hidden_states.to(self.weight.dtype)
return self.weight * hidden_states
class MLP(torch.nn.Module):
def __init__(
self,
hidden_size: int,
intermediate_size: int,
hidden_act: str,
):
super().__init__()
self.gate_proj = torch.nn.Linear(hidden_size, intermediate_size, bias=False)
self.down_proj = torch.nn.Linear(intermediate_size, hidden_size, bias=False)
self.up_proj = torch.nn.Linear(hidden_size, intermediate_size, bias=False)
self.act_fn = ACT2FN[hidden_act]
def forward(self, x):
return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
class BaichuanAttention(torch.nn.Module):
def __init__(self, config: BaichuanConfig):
super().__init__()
self.config = config
self.hidden_size = config.hidden_size
self.num_heads = config.num_attention_heads
self.head_dim = self.hidden_size // self.num_heads
self.max_position_embeddings = config.model_max_length
if (self.head_dim * self.num_heads) != self.hidden_size:
raise ValueError(
f"hidden_size {self.hidden_size} is not divisible by num_heads {self.num_heads}"
)
self.W_pack = torch.nn.Linear(self.hidden_size, 3 * self.hidden_size, bias=False)
self.o_proj = torch.nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
output_attentions: bool = False,
use_cache: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
bsz, q_len, _ = hidden_states.size()
proj = self.W_pack(hidden_states)
proj = proj.unflatten(-1, (3, self.hidden_size)).unsqueeze(0).transpose(0, -2).squeeze(-2)
query_states = proj[0].view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
key_states = proj[1].view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
value_states = proj[2].view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
kv_seq_len = key_states.shape[-2]
if past_key_value is not None:
kv_seq_len += past_key_value[0].shape[-2]
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
if attention_mask is not None:
if q_len == 1: # inference with cache
if len(attention_mask.size()) == 4:
attention_mask = attention_mask[:, :, -1:, :]
else:
attention_mask = attention_mask[:, -1:, :]
attn_weights = attn_weights + attention_mask
attn_weights = torch.max(attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min))
attn_weights = torch.nn.functional.softmax(attn_weights, dim=-1)
attn_output = torch.matmul(attn_weights, value_states)
attn_output = attn_output.transpose(1, 2)
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
if not output_attentions:
attn_weights = None
return attn_output, attn_weights, past_key_value
class BaichuanLayer(torch.nn.Module):
def __init__(self, config: BaichuanConfig):
super().__init__()
self.hidden_size = config.hidden_size
self.self_attn = BaichuanAttention(config=config)
self.mlp = MLP(
hidden_size=self.hidden_size,
intermediate_size=config.intermediate_size,
hidden_act=config.hidden_act,
)
self.input_layernorm = RMSNorm(config.hidden_size, epsilon=config.rms_norm_eps)
self.post_attention_layernorm = RMSNorm(config.hidden_size, epsilon=config.rms_norm_eps)
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
output_attentions: Optional[bool] = False,
use_cache: Optional[bool] = False,
) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
residual = hidden_states
hidden_states = self.input_layernorm(hidden_states)
# Self Attention
hidden_states, self_attn_weights, present_key_value = self.self_attn(
hidden_states=hidden_states,
attention_mask=attention_mask,
past_key_value=past_key_value,
output_attentions=output_attentions,
use_cache=use_cache,
)
hidden_states = residual + hidden_states
# Fully Connected
residual = hidden_states
hidden_states = self.post_attention_layernorm(hidden_states)
hidden_states = self.mlp(hidden_states)
hidden_states = residual + hidden_states
outputs = (hidden_states,)
if use_cache:
outputs += (present_key_value,)
return outputs
class BaichuanPreTrainedModel(PreTrainedModel):
config_class = BaichuanConfig
base_model_prefix = "model"
supports_gradient_checkpointing = True
_no_split_modules = ["BaichuanLayer"]
_keys_to_ignore_on_load_unexpected = [r"decoder\.version"]
def _init_weights(self, module):
std = self.config.initializer_range
if isinstance(module, torch.nn.Linear):
module.weight.data.normal_(mean=0.0, std=std)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, torch.nn.Embedding):
module.weight.data.normal_(mean=0.0, std=std)
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
def _set_gradient_checkpointing(self, module, value=False):
if isinstance(module, BaichuanModel):
module.gradient_checkpointing = value
class BaichuanModel(BaichuanPreTrainedModel):
def __init__(self, config: BaichuanConfig):
super().__init__(config)
self.padding_idx = config.pad_token_id
self.vocab_size = config.vocab_size
self.n_head = config.num_attention_heads
self.embed_tokens = torch.nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
self.layers = torch.nn.ModuleList([BaichuanLayer(config) for _ in range(config.num_hidden_layers)])
self.norm = RMSNorm(config.hidden_size, epsilon=config.rms_norm_eps)
self.gradient_checkpointing = config.gradient_checkpointing
self.post_init()
self.max_cache_pos = config.model_max_length
self.first_run = True
self.alibi_mask = None
def get_input_embeddings(self):
return self.embed_tokens
def set_input_embeddings(self, value):
self.embed_tokens = value
def get_alibi_mask(self, tensor, seq_length_with_past):
if self.training:
slopes = torch.Tensor(_get_interleave(self.n_head))
alibi = slopes.unsqueeze(1).unsqueeze(1) * torch.arange(seq_length_with_past).unsqueeze(0).unsqueeze(0).expand(
self.n_head,
-1, -1)
alibi = alibi.view(self.n_head, 1, seq_length_with_past)
mask = _buffered_future_mask(tensor, seq_length_with_past, alibi, self.n_head)
else:
if self.first_run:
self.first_run = False
self.register_buffer("future_mask", _gen_alibi_mask(self.n_head, self.max_cache_pos).to(tensor), persistent=False)
if seq_length_with_past > self.max_cache_pos:
self.max_cache_pos = seq_length_with_past
self.register_buffer("future_mask", _gen_alibi_mask(self.n_head, self.max_cache_pos).to(tensor), persistent=False)
mask = self.future_mask[:self.n_head, :seq_length_with_past, :seq_length_with_past]
return mask
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = False,
output_attentions: Optional[bool] = False,
output_hidden_states: Optional[bool] = False,
return_dict: Optional[bool] = True,
) -> Union[Tuple, BaseModelOutputWithPast]:
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot provide both input_ids and inputs_embeds simultaneously")
elif input_ids is not None:
batch_size, seq_length = input_ids.shape
elif inputs_embeds is not None:
batch_size, seq_length, _ = inputs_embeds.shape
else:
raise ValueError("You need to provide input_ids or inputs_embeds")
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
seq_length_with_past = seq_length
if past_key_values is not None:
past_key_values_length = past_key_values[0][0].shape[2]
seq_length_with_past = seq_length_with_past + past_key_values_length
if inputs_embeds is None:
inputs_embeds = self.embed_tokens(input_ids)
if self.training:
if self.alibi_mask is None or self.alibi_mask.shape[-1] != seq_length_with_past:
self.alibi_mask = self.get_alibi_mask(inputs_embeds, seq_length_with_past)
alibi_mask = self.alibi_mask
else:
alibi_mask = self.get_alibi_mask(inputs_embeds, seq_length_with_past)
if attention_mask is not None:
if len(attention_mask.shape) == 2:
expanded_mask = attention_mask.to(alibi_mask.dtype)
expanded_mask = torch.tril(torch.gt(expanded_mask[:, :, None] * expanded_mask[:, None, :], 0)
) * torch.eq(expanded_mask[:, :, None] - expanded_mask[:, None, :], 0)
else:
expanded_mask = attention_mask
bsz = inputs_embeds.size(0)
src_len, tgt_len = alibi_mask.size()[-2:]
expanded_mask = expanded_mask.unsqueeze(1).expand(bsz, 1, src_len, tgt_len).to(alibi_mask.dtype)
inverted_mask = 1.0 - expanded_mask
inverted_mask = inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(alibi_mask.dtype).min)
attention_mask = inverted_mask + alibi_mask.unsqueeze(0)
else:
attention_mask = alibi_mask
hidden_states = inputs_embeds
if self.gradient_checkpointing and self.training:
if use_cache:
logger.warning_once(
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
)
use_cache = False
# decoder layers
all_hidden_states = () if output_hidden_states else None
all_self_attns = () if output_attentions else None
next_decoder_cache = () if use_cache else None
for idx, decoder_layer in enumerate(self.layers):
if output_hidden_states:
all_hidden_states += (hidden_states,)
past_key_value = past_key_values[idx] if past_key_values is not None else None
if self.gradient_checkpointing and self.training:
def create_custom_forward(module):
def custom_forward(*inputs):
# None for past_key_value
return module(*inputs, output_attentions, None)
return custom_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
create_custom_forward(decoder_layer),
hidden_states,
attention_mask,
None,
)
else:
layer_outputs = decoder_layer(
hidden_states,
attention_mask=attention_mask,
past_key_value=past_key_value,
output_attentions=output_attentions,
use_cache=use_cache,
)
hidden_states = layer_outputs[0]
if use_cache:
next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)
if output_attentions:
all_self_attns += (layer_outputs[1],)
hidden_states = self.norm(hidden_states)
# add hidden states from the last decoder layer
if output_hidden_states:
all_hidden_states += (hidden_states,)
next_cache = next_decoder_cache if use_cache else None
if not return_dict:
return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
return BaseModelOutputWithPast(
last_hidden_state=hidden_states,
past_key_values=next_cache,
hidden_states=all_hidden_states,
attentions=all_self_attns,
)
class BaichuanForCausalLM(BaichuanPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.model = BaichuanModel(config)
self.lm_head = torch.nn.Linear(config.hidden_size, config.vocab_size, bias=False)
# Initialize weights and apply final processing
self.post_init()
def get_input_embeddings(self):
return self.model.embed_tokens
def set_input_embeddings(self, value):
self.model.embed_tokens = value
def get_output_embeddings(self):
return self.lm_head
def set_output_embeddings(self, new_embeddings):
self.lm_head = new_embeddings
def set_decoder(self, decoder):
self.model = decoder
def get_decoder(self):
return self.model
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
labels: Optional[torch.LongTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = False,
output_hidden_states: Optional[bool] = False,
return_dict: Optional[bool] = True,
**kwargs
) -> Union[Tuple, CausalLMOutputWithPast]:
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
outputs = self.model(
input_ids=input_ids,
attention_mask=attention_mask,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
hidden_states = outputs[0]
logits = self.lm_head(hidden_states)
loss = None
if labels is not None:
# Shift so that tokens < n predict n
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss()
shift_logits = shift_logits.view(-1, self.config.vocab_size)
shift_labels = shift_labels.view(-1)
# Enable model parallelism
shift_labels = shift_labels.to(shift_logits.device)
loss = loss_fct(shift_logits, shift_labels)
if not return_dict:
output = (logits,) + outputs[1:]
return (loss,) + output if loss is not None else output
return CausalLMOutputWithPast(
loss=loss,
logits=logits,
past_key_values=outputs.past_key_values,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
def prepare_inputs_for_generation(
self,
input_ids: torch.LongTensor,
past_key_values: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
inputs_embeds: Optional[torch.Tensor] = None,
**kwargs
):
if past_key_values:
input_ids = input_ids[:, -1:]
# if `inputs_embeds` are passed, we only want to use them in the 1st generation step
if inputs_embeds is not None and past_key_values is None:
model_inputs = {"inputs_embeds": inputs_embeds}
else:
model_inputs = {"input_ids": input_ids}
model_inputs.update(
{
"past_key_values": past_key_values,
"use_cache": kwargs.get("use_cache"),
"attention_mask": attention_mask
}
)
return model_inputs
@staticmethod
def _reorder_cache(past_key_values, beam_idx):
return tuple(
tuple(past_state.index_select(0, beam_idx) for past_state in layer_past)
for layer_past in past_key_values
)
def quantize(self, bits: int):
try:
from .quantizer import QLinear
except ImportError:
raise ImportError(
f"Needs QLinear to run quantize."
)
for layer in self.model.layers:
layer.self_attn.W_pack = QLinear(
bits=bits,
weight=layer.self_attn.W_pack.weight,
bias = None,
)
layer.self_attn.o_proj = QLinear(
bits=bits,
weight=layer.self_attn.o_proj.weight,
bias = None,
)
layer.mlp.gate_proj = QLinear(
bits=bits,
weight=layer.mlp.gate_proj.weight,
bias = None,
)
layer.mlp.down_proj = QLinear(
bits=bits,
weight=layer.mlp.down_proj.weight,
bias = None,
)
layer.mlp.up_proj = QLinear(
bits=bits,
weight=layer.mlp.up_proj.weight,
bias = None,
)
return self
@torch.no_grad()
def chat(self, tokenizer, messages: List[dict], stream=False,
generation_config: Optional[GenerationConfig]=None):
generation_config = generation_config or self.generation_config
input_ids = build_chat_input(self, tokenizer, messages, generation_config.max_new_tokens)
if stream:
streamer = TextIterStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
Thread(target=self.generate, kwargs=dict(
inputs=input_ids, streamer=streamer,
generation_config=generation_config,
)).start()
return streamer
else:
outputs = self.generate(input_ids, generation_config=generation_config)
response = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tokens=True)
return response
{
"metadata": {
"total_size": 26529802240
},
"weight_map": {
"lm_head.weight": "pytorch_model-00003-of-00003.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.W_pack.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.W_pack.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.W_pack.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.norm.weight": "pytorch_model-00003-of-00003.bin"
}
}
# Copyright (c) 2023, Baichuan Intelligent Technology. All rights reserved.
import torch
from typing import List
import bz2
import base64
import ctypes
from transformers.utils import logging
logger = logging.get_logger(__name__)
try:
from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up
class Kernel:
def __init__(self, code: bytes, function_names: List[str]):
self.code = code
self._function_names = function_names
self._cmodule = LazyKernelCModule(self.code)
for name in self._function_names:
setattr(self, name, KernelFunction(self._cmodule, name))
quantization_code = "QlpoOTFBWSZTWX/mUzwAK6f///////////////////////////////7f////////////4C5duvi2D0Oj1ppVCJ2zQFYbnbsxmq20pAC7kEDb3Z3nWrextY9NZbavON7nveSRqszudmzAGGgkeh0Pewk881e3Tz13kW9YO7uA9AUUiAWLNW2HHWCE005Mdz3jHs1Ic7QNCQBNGgmE000DRNoGjUYmA0mEmJjIaI9JtT0JoaaMTaQ0aMjTTI1TzKMmETwyaJ6k8p4Ke1T0wk2aE0anpPSHppqNM1HqYzVGj0MpsTTUGpoCAAEyAAAmhpPSYowMk9U8mqb0mJtU8ETwCZT1DQ9R5R6htE9TTyRptQeoyHqA0B6g9T1AD1HpGQGgD1A0NPUAAAA0A1Mg00gmhKPU9E2SekHoJ5QHlNDEPUeoDEaBkAHqBoABoNABoAaGgBoAAAAAAA0AAAAAAAAEmoiIgmiD0maRip+qfpR+k9U/QKaZPUepiGeST1HqeU9TQ9JoANAMhoZPU0AAYnqaBoAANABoAAAADQGgAAADTQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASJEE0AJo0GkxGJoZNKeBoTCnpNNpU9knqn+ppmUnom1PKZqTaaTTwTTFPNJ6pj1BG0eoaMgwQGkYAGk2gjT0jBqaY0RoDeqZoNEYT1NpsA/+iBrt+OVIiCKqfH7N/e67XZ2Dx9tPHyWbW4gAENNTtyzk+/WdoU604SoXU0JgfqgQxVmzbfdmaFcVxQAYINDyjTKU1FCUUzUuqqptg4SBgwIAHYE4NwQOrbY1bOF26LUVuxYr3Hp4paZXaqKU1UmXO3K+IXn2hURrgAegAaTANS+QBclUN6tpvhn85+uTPCLxzj34YO8MIMg45eRAEy9IYbKxeZTRnTy6GpPLtVGWKKK6iuDLa9wjtSmUQREX6wHfE3JeTVZdoj4Hg/3cHlBdw4c4BdGvigzZsubPr3eTi2hs6tZz3J9zUVm8qH+FPwSx4Tdr6by/OA88iLHk34rWNt7fT7NwqqqqqqqrGMYxjFcdqvY2mXyh42c2ccxhtyvBHojjUlyAKRgbvAB6nhls1wGLTOrfGMBsqRXl9Bl3sOlvafSA7sDrmAQI+mw90af+bvJ8mwjP+RKtjobGNzbfl76iTHMiIIUf9oIoygqSG2NLn0Ys/mZ+hzufu7epmzbvP1t7S0Xo8TKK7q6G5MA8vTgBb7Bf/2kITSLsH7Xmfydz7ahAt4YJbBuAQJI+1M8DLJCQH+UPbv212QWIhcCKhBrR2eryfQYIiIhKE0WtbOQ7OwM7OxtURGbF28NBndi9ejVDVA3dne37uDdzrwINS+O/0AzQTCgUjfCAwkkKFMT4Kr0aV3DicVAelGBesGYoCRcLKq5iBFR6SzOzrAwFWDFVYU2XT1oFaRJk2JBDOwVk1LFZZfwY7tQBYMGdECFA1cLZAg0IlfCTCMgZ4afRQBNvXSuMORVUTxTLSTgMFoUtaGLIr524yIM+INSFFIOHQ4TG5NZbd3Su3Nu9raSLd/ueibSYpAL0D42ZkAtD0pnXrfTxYPBw+mAt1cKPCPmDNMCDYCBiQwmANVhdDjBwsdIKyfH1slCvWbJC4QO8SBxi6A+GEpDBN6UQnPaEvBqFk3TwChKSowEENpyAueDIFs6OxxLRmFSUFpjWgYpECgDgfVBJjhg4GGcI9CD0S3igCrdziS3ZoYHlQE+7AELdvbebTVsdRvrPHCgiAbSYzUN0z0SCshLjaUaREEREQQRHNKAgAS9o0kukdJx0ulaJk0kINzlUYN0wWXLLsmRgSG1BEJNh5sCuVtIybGlKUW29BziJUTpqcA8UCCLtOGU0hH17BYTERfPKhCAwxJqSSSMd+umawlsykXZiKHesslqlVDKEHPzFhIWwJHTfcYCGE9dQK9sKixjNifLkW1iLnyZo57BBx2jksXPYjcaA6Z6rlYTl9ocZHn2URKVXnY/Wsrc5l3aym6Uq7u9eu2szSbJgwhqPqfOR1JCCZl7/AehLVBSIXc9npUk8IDzrRCS9XKMeamSDmFxK6OQDhwNnxubbnQygQb4DEL6oD5qkkG6F03dyDAUJB/awNUoDCa3CmYy2QIsK0Z46BoX1N4kY8aGNFB8WZAfWvaHeUT4gYIjEsZBBARIFAk2jCTxAmpW03GtdW4WCN0bLJiiqY3ixmHAWRqqQKqgS2hlf8mwszkhUy3LDx3GLdo5AHGAgC4BogUAVgH4QM0AGAImwbS6gwANIep0rJIU3hBgaeKAEcnzfs+g/sJZnETvInDcAH5fE7azmr8EyIFx77caxbrDBC64CEU8wCqzAHPgkk4kiPREKYHn2HaoDBWCCrFBrhR+XpeNQkdbzCBHee2hW8EW373k/qd/PxGC2R+IO4vmNEAl1AE0l4bEvmnfd5/JYs5gl9XpgQIS7g/LAK7owBwgso9j0yEB9MRIBjqmkLdG5uED3tICA6PYXe4WItRawAenfJ0lCFupoGvajxuQC/5YQPnwFpgQBMNgBndpgVNJcyw+5vCJgHtWU0EDYk2HsvD8Qkg6ANAd8UQXGH/3X3gXgNDefHyaQ/wd93Xx87hWWtW0kPCQGR+KYiPeMQse27PdNLGwhlz8WJObSnEQyHJw1JmStJXTtIg0ZKEHrLZCXd1ljLGkkxtpsDofXUiBH0LLEM43kb2waJ26KZsJ9sBbxcAqzUgWxzogNFm4vSxjMR58r5Xm8H2+6ItGcNX2AK3GhDIMzSX3YyFsbNG0u0MxvZzGFv19k2E45tXrK+1OKUYRiH2OT2Fs7kqtxMDrANVp2nxreAZg02UaFEsuf6+urQi1PxvNOhuacrStndOnonV3e5Du+Xjp8mjhiHYPNexu7UKSbt0Gs2rPIVVVSFyQ7phtQ0ZOUySoyZA79muzuLBZaLAW20gZIeuJDacErguFE3e70svo0S0mRBMBu33rjqVrNEN9A5PHvOgukEPEgb0tYAMrvcvIXB5ydzJHXQ1n+t7BUI24oJtSCTAUet75rBpXL4ylQ4LGBpbQeQCiOku+8rq90o18ga4WEGBDhvHB0YYd/CDLIMdDh2cO/i/RppcEi3Zd+CCU8OdxAAiOgi5qeghJkUnO6YGZi5LEilo2WhSiEVsU2IK7unV2rXG61Q/LbUqGx72rn2Uzx/q/fzsCWUFCQyAA+XqfGVGvL1kml0MVpjJl1A9vYoYTSatnV1+z2czsdoc4QFWLILHn1S71/r3V1S/fJMgDlXX6DVv8+FeECNi1u8zf8K8r1Khq7twFu5xPfZJT+PLpYUZWgGNDG0Jlq4rsQy86u95xqTdO0TbSGBdDOUSyyGHQAmP5mgNfVvgeY2tPzlKbyrvnaZhgQ7aWeJjzbF4mjPlro1hYjmnWUshKxVsQ6pveK850taANOgIE/aJvr0IAC0g2H2d1agVwnBkAF1kl7IPZc8mBthvlYish4AqABgI9hw2cExRabO+8Xz31+enwlCxSbnfVFlqig3UKGBQiybpEBGQLIxuoUMVYLTt53sY+lPlxSAq9f3lfnVlFmiBFrOhAeAF/0/N6HI6/+rsQ2+D5U5fenadDmtFFgeZLLESwOgWWIlgWFo+uFROhke3lKQ4bf0mLH3XSOgtDGd73hfMwDM2aF7Lonl7AlbiPbV2zY2lvu1Vj7jzlmFYoKieH93wt3fLhBXgYUGJEjga5YWEVyE00qIYWXSKd0ZaZy+vuCQlhaz5ELs9n/pjuFAHpoDCMEEtseECQF+Rk58EyW3nzCdlyCeY5WPItdkDZ4egXmjfZTLSVT29ku6KCGxHbdTBD3z52SxkuXkpoaHyy3t25+JwX5zFdYawDASl7397IB2tunNbt2FygaTBIO5qrG0asQmxEVRGCn26UX6DewTmic/QqkLZjdCTqjQDGlxy4IODucyQlmE0zkwSkR02cZjZcA1MzMczZAf1hfPnZT1IGtWIJGOcpzgYwCGyiNtoxRkupRElCCAgWJcE4igRJEQogPHYVAVBAEYDBkUEBIOSMK3KJNwQllpqWZARLCgMM8TkQoHOSZTDbSrjS6QtkYsQSloWSmQ4BlMjEJuuWh0ERMIVRLbcNDDQalLRQiEoBIUKZaiQpZQ1KoooVlNtjVVGAsG6WkNS84MJcoYIgjBrKaODOaUZG6QUZlCUGKy25MUVYGMWC+95zG4FRE0iyDRISulc0GQJt6m5u8WSQD4NAiDAMD9y0Q4TBGAaAIGe6PfdX9zl9Xginufp+HmPiAGfY8ZoDAarMoQAD9kA2OUJQV3lBq86RzpT8nbXPtqxsvN4YTDyOQgGEarV4Tc5h1yv2Npz+65PJpxO/Tefe5S5U1n8asAC3AQIACrUA5XacxgALbHvUfi9ApR956Do3PCWymCzTo7JjufU9DsGcQWqAFwwZfDzR+m6436pzvncYkARkLKOxX23RuLsQeK067Y/Fq8tB7igBMvb836/03fkV4qZ5YY4pFxADLifQb2iaUAwjesDs8Nhx5vnIw3rZOyb9+jyaYazgr2vbSKuf82URMcyf+99L2sWJHqW/I0PfaMR0KsULcnf9Lx/fJFzattuUwcjv8vdJed+FY1s49FrvJMbRVa82imzbdgSpDhEtleDphWrjgzVu59jsXKG/3f88zolkjqRQUk+Xm8F72190OzfqwfT5XAYbvq8WBzq/B+4rLP8j5PDfiytkicVOAAJ6QOe+hWqqwgfq61qtJ7jrsz89u1dDqsK/9Wur9Po5K1vHsXseRHoyF+LoewZ3uHaanw5S9LCW9Gj8k3e5ObY3NfjabO0cbzotaAPB3XIg+av5zaHst8ijMqapTpVtdwy211QZINMi1UCIHnAB3ZLFDZQuraVlNALggow5ygAhEo9EDHUCSm8+Hhev7eTufm8onZ7pATIUwBEBBUUEPBw/zcrl+pwtDJe2XApoPk8CJjTqtqbv7DYwZWFs/M8EhDcYE8AK8A+GfX/aQkYgSLdftV0Id/5gf3lOuNNC0799E3uYYtpMg6yABaJz5en+HpUfveNBXeYA8Whj8TtZK60F8V863ndv3PwKagCzpXtfv1APjaUgxkGLtptiZPR9vldS2Bfy0pT3RXWJlLCCj+GpAz28S4v0YQrYE7We9WpbVXz7KVTWEtoXM/UPZhYnpzdeokWJdNHQ6JQLxp7bOfci50rBcdOdhOqmyeC7B2rL6rxd969Xxc9L4zMrsqZ0+DoaPeSn8Y5QMLTOLpdvz1qaOO5xT1xPjgKnhTYa5pzi5U+bDcHXzYdxpgAbbhf/e8aBprxka5aM2J3lYXBG5G/r7CunzcPyjz2o79z8eDKkMvdO9WixswXLu3TkpoYcV0465fwUxoxC6L9Zwc+QsLDfqipk3wMSSRkBPM8Bxrwt0Mjr4IWW9Tw+Kw23yTbUyYJqrgNaq7saBKAdzYXMQ6mkrfqt72Lk0YwiZmIKkXUgChISCZMMrwdnjWbJDoR5ZXGxxAX5uRBfHBOk6JS8VVVWd56zxf8v3uR0/zON57e6BDuqIcQDJ7H0q5BNPaWbExYw2Bj4tRM9kB+JfynyyEfR/7ZiPXRFLmwpGGjLF9G6/J65mkUZEaKrUdBZYUxFKqGJL4LAbEfZjLi4GYXhv+x3ZpHkC3YADdMsKeYmfKgtzUd+Y7dVngbdcEFGAL3VqaYfYAYMtY3YKIQumTVXUFTFQyU0bqIeMgV2WOcZFXICpoMvueYVy0mHAiaeyNg1p5/QmSbYgyb7WQdUPfY3QeKc0hewGB2z2vH9t+pvy7B6P21pG+wXCMQHZl30TJonLPhQg8nka+raw1OLPUVWvIidrloKjcLH6/YAwepAoWEykQ9Bw2+YU/N5dbXnsNcPbubOszstYSwQYATYulLN0AHAgwb5t+VfATV6uhICgRgDGUaoVNNLc9ZMMW5+qKVhOyoRMLzJolo17ACLDPes+aoyeD5aIZm46HHKV7KqGX1IGbYEEDaAh0Vj+43wIMep+e+gsP4UEgVjmMAWTPz2XZhQDA6/Vzbk0fK+v0+bNB12LRbfmsufKzRgw7Hp7b+J+N2LqWXdwWTvhQ2rIPjc2cgS2A4Ub7IflPitJFAPyFvbvHK+tXi0Zcbi6mO6HTaIydOeYDmSYUIACAZwJCEgueoJnU7W6WfGdWtl1TdD4WHQ8AgDnmNUD+2YrjxNum3+1R9B+XSiSGrVLcFrVC/Z9R7D8DslIGyMPXbJAFthAMNYs7OdlqPilZtnwtReItC2Ff5vD8mQHwayX/vh1LB+HwoefoZ6LWUKb7WH6D0FmEhEKgwAayAYsoKUCcPepjDQYfA2TMWHoiS1lspYmEi2HdFULic/ucQlrFCCwPxyDeITAUsiAUFggCtZuDuVPLvVtM4WCG6DlrLwBL1JAaQFWuf7/uHZ1WAHEBuz9BMrshS8OhZpwrmYpgUIFoauEJQxtrw2iu9bT1ZLik/F26jhZblz7739qomvexIWc5hKq/GfFAebrnq/23mGuisbZhiROtNdFBDwqCBc2zrTYMfhMPwIF0s37CzzvYKeLjIfQZ3D2N6o+FRgDOkDGFGjCDiy9cJBVMOBWJ1AjDIxTAz/LwSRYuyzhHyDiECf0P53hWshYcMslf0PC0tWfLlUztN1xTxhwgkAudx+IE+NuS3phgEhRBo5lXEG6KhGydUzSU2WphfuFy0VkjH2AIPddbJ679s70tkL1rBEEEEmFgwK5pRCB6ZC5EX7ZCkCTI1pQUDJAwhQoosjBZFAjelFmydnwH9j46Ei5DD9ZaOvgT54UpSh4mD7FR2rjbJjFFdyOauUAjNr/DYBQJkLsUsd2mAXDIMHOuu8ULJhkx21G0UL7fnlqIPfiwdblRpcEaxVjru+6bHpdvj38qAOr1rUACbHrKGDWLFjGCBGYoGREGZBh4aGauRARRTmJdfJBWYoCDdFrBtCgYo6H8NyRIvFfbeTFjxF9riIiIiJABkRljjGMYx1mizcSoJ9AAFqKHXgBBgYnYjs06fFb2fl/bceQ8TeN4h1jrKPd/Pbtl3dl3fnbu7u7u7u7u7u7u7u7u79ZxeoA2gbgjyqd70779v47Lsepzo6y18vJkhQMaDKDNhYbWPpJA6hsD3pzguE4gtOhzrtDoDA3oMbPVBY/3fi0DbkWt7GQwMw2BtpNpeKt+v6KytGxxqCQ8JoLCGKIALFxqwIOeI7fqckjnW8eHjcW3xehEp2SWhvmrtDDdoBSOn6jSjQCgLuhd+EBOwr3q9GbUewJDA4QvH+DpFwt+JbtP30yJTy10KFMLT8MmAGUKkqn3DQHSmTACxjEheIpDhGuZT/WrsHgP+ly7Bsto8UYb2bBvwPRV1O/WaEbmIEMEbQtfphLgUDADF7nayfXs1CXBxYOi1aG36B7rr5EX31tzoym2bTIWw0maxvM3Gs+KAOSMztimS4oGQokBRf5dGKNykDp8tH9chWc9k7/6I+SxG5cZSnx52CFhoDqaZ8wBethxjRVKaRfCZTeBpi6ZNdZFjROy9x6tdgMem0rtuH6wbAz9tKvlhJ0JUP1e+2xVgroJFw8tQxLPdwVnLVMDu+mmfk9b5mK3qMNwiMyBqFaajMIgCDBYUXbdKwwVVhoMXL5YLkI5FFviIkYQTNamuapRILAqCSAYSsIOOVAtAUUrDwBSthRBgyVAM1wBrIQhhTlJKQIwFnj+b+aXuJyerhwx7HxQLofddtH71c6UuefecFIrANhfgkaIt5KL4iV43tMeP17BD8D7Dl8+AQTGQfz/rp3JWOfDodJOcvDAquYl1QQiHknUmAQ3lYpRUtJEUowXnnJnOZjZzdINlj+y7lXBb2uPR6a2E5AC3S6dBaJxYl1qyRXwQ15QflVkAK8AmAwql/n4frTztb/XRXV9J3eXRfv0MuB1OShRrtbrfdudwKxsAYC+QHiNISbAQu46ffUU/Flrw68uJ5L+7p69JjfglHs5PSd0bjADZeFsIWCqy0kQ20m3CskYLPShb0aoDdHoJBUQVEirAUgeRTtUBwAa0INXTIBPMHp9AongtXzSfuWCFQfDtzRuYRVG3WIXUjEg7b2vBZKT4ESq2tTcMyGXlqZN+uJ3CaGHEJB/3Q6/xrGIGIxyzCG5tLlSXx61sy0Bra4IFaYrjF1zJj5JPK/SslbN65uYffnqtyIX9zren+rrSsXVVhq8VZ6DFpnBVlD48AoMeltsyGSZSpdUjR6bM9J+oHRVmhpp2HBv+N4PXeS76ctP4LOLvreBzzyCr2v1K7eBo+dr2gwZ2x9k6EpHd7pNRl6Pv+IgXtj4WmtlEUQxkzWOVcT6jcLrhax5PVvgurz9q7DtdWriVdnpnTlTrQqdvWN6ZNr4OdpMM/T5Gg8irLXS/YOgvhteS49VEj8+IfNiPOf8MfMkUw+lYehdNxKZnNbjIoJiqRY1KVGIOWpRtq4m6GCyiypZKKzWBQq5j8RYJE0NCiyjJmgUmDBi8BoJgMVJYXMF4aGDL2XQ4HDKaRGaGhctNBrShK0bSU1BpFoRaTkkCCUWaDCx1MUXQCaGRhgoqhCHmzrFyZwUFG27KVdmNgbChCbZNAMghZRoXKM0CMEXaUTZswtBpLoCkxONrpa2wL0qn0mw2eV0yXs1MGgGSTcAo/GELIbpoe+8gKSqpV0ZIoIa4UCcM2EdVikuAPuDlU89YsXrb9Zb+Pr/F8NexBBbEwTQs9HmsQGBYPoK6bZKDvj9yyALrlOaMbLpKxRM+njvB4id/1Y1WPm3K2A0BVSlgWJNjYxne6JZ8mZfv7w1Nm3/GFOiwonktduZaRH2loGGhNBUlQiHENkybM8pBim0iaXcpE8dAF4GodlriMfOGH6hHY20huVvSlLDBRKHQ4Y3SyKrmCcy7ZZMDyNqVWWwpS+RHQaYnmEURGCKmQc8ARghpQffVMwK2vz6V97O+59X5foz4jUfN33Z49cKeKObXDE1rNvV2QaDOLOi+R0fl+RM8jVQ7QgNiDMzMgUCLlYO71Vn7X7vF0UcSZX1pu+s+xC4MZXNQCl0/rb68aAY3rOJ/jaw7EOYIIlln6V+oFpwZLOUjUVHfe6pdjXgAqsD219Ri16edZ03hcjePW71C29Wy0nTw5YIfs/Y9sNovb+v8vA1P7beB5bQmvEv59b+BnUs8yqQ5/cLKV0EZRMOGHmpsMrPidWDXTyP3fuO+w/9+kbujeEbdg+n4WXJQBn1kL3Py/M1JnkOu70oufaRPG6bsd6SUhq1TALBZAhKpoyMIvkQGRAzJD+udGR9e+WlVzjlJeqELl+D2smL4vG6BUFpiKHDwqftFBbX+9VV338vNg+5kL11bd1yrZaYZrGW36mrUIRi/MVgrNNITCj++zpFSOrRLE+Prlr3mYOP1TtXvtpOwLP5Kmt+3zZvXSsOXW+ix6mXS5mb1MnTvW0u8yHF356RuzXUyeGiLTe+IvXvKmJrEymIxQT9QMSU8WTHgnJi1BgP/WoqICgO21v9Hiw8IaXJY1619oEj/3cb/7R/nddLm6VA5xoN0t3XY6Hiep4VGnzs/Od0hj8f39YuAC5HvfwvWuOeV5fz820AAGglyrLFDjUrv//M/fwNdsEvj0MrTXrV8vLZfMvKMAzJ0/Sda/28/N0QniGmKhoagYUYMGp8IFDrOoi40L48r/SLxfSSDw9TM4P4vUeHE+iTmchyj7Vmwp7m7dejVSNZx+2Is5jzuf+HmHr2aml3fWein0wnXnxne72A86Cc3hrzXgbfc7lNQiJuGMljn2Y8pgXjrTczIy1teeafy8Tz8vmzBWAAFXfojX/x4Kv/YFNprgURbUBytnsI9/0WeuKmZjrWcumUGQgRDIEUsAwZkQMwPsGTJjpTEw7YAwCs7Oxn2XE+hexXn+z/L7HC65bJhCR3SxMdHngfkGgqJnhYzTGjw9StB6E4VI6SgkdNEdesLFW0cgxeYq7YABEPlMspZSBtZDQYZMvK9Cbu/UzXvja7MLlO4BfVYkMH5dwAfQ3u9WEkCoveLyp86iGmleemxREJQ0NoFyWpMxsNQCuuLGCdP703Uv1a3JeT7vfpxp8J+o/ft+J70dz7dV+1QEcxyT6REE6vsl2+0Yd8ayjKWBg2j8pRTeGhVxiYZDc6/YatrSzsw56wbWzGkp3FLpa8+60pan1LSvb+rcfyjTyEM7yC5BVyZL4r0qVCMZRc+AMHxlyZMP5QQiFATNqpVSdy8i66S7oSIl4APKPMzOTus/KeI8rrY6qBkuRSWT0y7LGvNz4KBjigkR4r0v9/bluxFmxePnvZRhpjgezOiX6bPa5LZkzsaLjmf6NzPP1ZfH9p7j4MsQL0YMETXjeb/5lAYcJWU1RECXppb+33HdO5Etl4xLXPxfV8cGZ43FFYXKVoMFQHssoAIzyiClcZR8W8vqiACqmcw8DAwzLM+FeLFaAYRiJ1DFqKh2Fcs+6Zd6erYKNpF09oZhCZNX4DO1OL94JPGTBXIPMmPjmDb0GlmwFaWG2CUqSjhc20YNd6Wwzu52BklGYvDcMnERi4Yh1wqwcOlqiLatNe4rj8FcXDxqMSsgYP5/FnSoTq2VVKttXQ3Gxq0q0Shp+qCbIAeWxu1Ynpd88H5zJfn/V+v+5/N7nyR7Q+n02bmML7aF1Sg+a32Ud2eQx2a8dQqTABf2SKJgvKADJgAJV8Rd0Wt1oIVj9nr/ZfC7fkbdqnS9R4eIbqH2HVNjOYdggfFeSAHKIkaC5R2rzEzdxs7dDCzizsiB7OluhJplyBBWKXPmS0tsUNnNs2D8zfW/QTSAr0EcsnQ/YPZBD4D0rHa3rkC2DHq+G97XfliTeY63fQow3RQpyKsCFgdUC2sF7aep4TmSDjlnDDpfIUJ3Ne7AMT4D7xpuM+j1hXBxYcyIpO3bvLubMhwY3Lrr6KfLP4PF0tpDjMOew5rBbSSUJPAfRMkDCSBum/B7S97oYaYZS56rtu79Vh408mfXcm6HcL0Qe7fRiqav0GhPcuxMpZIm/WHpICgBUirY8aK56MaW53+L/x+BbXNrjaySqntSLsoHFEiExu5hX7+yaqu7Ss2LrWVpPp9L8fuVDJdVcPqIQRFv/gWlUadkCUYMxFQf26Nlq3czS1/zwLAGILGRazcevp3q9/0O/YUWwXKvQTQghgHliLIIbcY0XxVr/9oV2++gsQ57NkRK084MjYapPJJ6Gd7WONsJRq6iIJo0GH/kO9e74wvERAiMW7UqLI+2obG59Xcazzvdk2UIhBDN4V/KqrwHJ9EpMftxjsugftMee96M9+G1DfnomWt7OmvNC5TP5/Fa50GNfJjieHFJ0mwlIothDYzg3BQyahykpudGZEmgiK9ViiKhI9ypBUuKuau8PitJWe1r0kVIrV4VRDTDa74vSvBytKDcNCzJ66Oq5G+hTTGgbpBMS6pJTOmrIjb0m9HsPvrI3rQhSkRYc1aEmn4+CFS9MpIxTpLccqtp+dpwTDqQfFDvleEeOfwGuSJEiR4QBtGkWjWrKysrJEiRI3Pd252xBk1NTBRRRZZZZZZZZZe4EJvbjqWGaaZgEypipYBc9da7d615Ozv+0TPBMoiPZt+OB7H2evtWBqyXzg9jgyNarCYQHxeABDu8KyT59xFO4fpXed3nMVTnQhwffnGz0DpW+c5RkbdjYgCQgDV6Sk3OZyVhq5u3M66CH4jQq6byDLwIv8D7ipARoPE7/rm7y2+93QALi1QT9F/QCxMDOQkHeUdC+o3NN9GXve/W1Ua/wcVgmxFD1YTuKB+xQIiSdMyXLjSbjWwNfsJH8DqADRWZHIyjHLolbAN4CAMrT3YQqcfwcVf9TtpcgPfzwWRN7XWJzrS1KzOVWXccRQ+9TusY64JEtzfyHJnKixBwcbgCBAgQiIiIiiqp3Pje3Y4/hFGgiIiqrTGMYxtsZSR3dlixYyrLVZTH79fh8yNTc4ezofRU9vjHOIATEYEQNb4IG7bzkD59jIzRNInn9c62cuu1ZkYpfHu7uokt8nd1Hc6ApKjEt2qqbEG2l6oUPERCkrFLjmUay3EPnj2vUe43MqIYdrm3PZT7WrLfnw7y9is1SEtuI3OsO3EW80l8imWVq1Yje2a7qnbRVNK7eZSUzwnE6j9CLm24oqbZ35UTokBKroRjwJNyCBEACLMRjnOy84O5zJREd0g8Xa+y0W7O3tcCI+46EvAjDUyqYnOCQAfEhYjlWVo9HFVl0Fk1g6rWywYXLyW9gmyJHKcFdans6g078Q9ryUjaXacP7/PvwauCguS3VK61FsSTIa5RZd+GJqurSiskfDyz7d0Bd7WxYHfJfTrpTamo87sRYMCEdyYaUdCzhu3027ABTtQCAnwKi9q3KK/rIpk6zEjGHEvADnOwuJ1nOvPr8XZNswFPZ07G/LauwBMG1tOWNT76s7Jw1OxxW1BImaJT6XUIQ/1VPRP6UZLBjAVwit2h7xS6TLbCUnzPvqOrOfrbFh/ZAFnP7jW/zIMkMNMUk5C20iKshen2HLTcv3ge8jBXRbUso7c88qlYXXozqDXWcHg21XXWzupu9YmNN2aY8W/tJ3ru1cs4YtK5b/YBitp4WYoOvZCpCIC0Ju2+xw3MABgLVFBetW9KA2pqTQMLlkKFfMNANN6+JBLD7W6/i0AiMi2fIgslxtlD+bdgBbDk1FxvsbR+npU23xUVtnBjvadzYRwqwnvWSPbrgxgFM01Y2yuGIJh4HBXDlmKSUokWxg39HUAD4u4+D8ivAiXNQkqnkKxTsDkVM+u/s6rx/w/VPZ1yL9nnzJm2YZ9Wl+9izPDiRnfzWU5Eo5duybQnktKu3b+J3pVuuBmmnebBXfiZtkpUjLRKvtuhD3GDAd3t8lPpMQgVQmkICwxxqhUhLQMPWxbwjlswPn5rmN8Fi0j25H0DYQMgIsU4+OvNxfxINfZR+ndisEVJrn6M1cgs+qsqW2AYv5gIBUG2nAI2sRJdPp0pkIFsJQ9DC0Exajuxg+5pGLShRHi9wPxlNGkITynkwYgPc5Bjm1ceZiqsTuXbr2ZrcqBszMKehW3A7cYHig2nqO46ef4275H+NjUxZ7Yxj0XWdJ+CBStOyj3EqZrP6f8049HRTOibY6aHBkysu7Zy/0S6gyH3v1st5NJVth4dqmwuarDr5z62e9OpPUqH6te3WRJmOs5XNggNsBgGGgo4SSlh/wYAXsqj3aHIiODcmQbAbQltCKcIoU5klptJHQ0l2P4Tgjad8WBWp9XyPm/j3QYeU5tV+GSJ4bCaYcK2PA4Spq7rr4bGK2La8fhcB+ZpbeVZdDoKcxwCBZQgvQmADvnSmoonhrOe7esVg+7JS5aUYwMCekjlC6YlQHUxfh1evKIB8OGrutYZ4YX41h6Jq6hHuvnBsJnjhYHY81i95iJiJTU6/T7VS3gB1qH0ACm35YBe58z7ceWShP5goYAvCcHOTphatcimJSi7e8cPtVNlLBeanev47WzlgmaIlrfg8PQALIwuyc+Ce7PTEdI6IMaL62wH5dzYaANEsRgmxYif+uWKupAwqrJ4eXO3BFsHrOiYQRSnB5GwA01qir3ZWamHuBtKIrzLS3by/XYFMY2AJEnhaR7ycHZFV8q2AKplu2J5dsQ24LL0qZisABXaOzHlwBFOQv0vOYWldhDsVt5f3Y4pEAsNwPQChB5QmJB9EYeqbx1Mx3plDVGMY02NMYxjG228wkHXLQBuctwIzDl0DNb2d3Zr2eV57mni8HxuT3pPieEQB9MdPlRq2ASoAJ5D34BKD2+jwhMSM3k9e3pXf6aOC4LK2IgIYJ4xQMEhhPzy+0BRQRAMTrG+uVq2FlPAAWvayCMW6HdOctiAZvYzmADuOlcPkF5QWJAaMRsb5I0Onl1kWwDFstny1tu3cPUt/f34gagGAiIG0z+LwJMwuBjAAO0oXQ+j2OhzkkDWu/H1iOt9LZS2d9xud3NjEIOUBcEGiLbYAIhuk6kG3QiZ7Vx448qOR0823ux6gaDAo/m7VGENCDY55QyihE8PY2c3FAOq0eB5VrR2rVOD8Pk54g10gYFruoShyCA600IlGADNkNWFwSUq26fo1MfJozZb8ivAWwKtUCnsIy1VVc6gilxgZXuOpIn5NqpQ4t1rnTCc+zVGQ8dLhuE4NDF7wA+sXOKNy3yzCWV69Yg3C0AUAEgSDmXcoIVu+dFgcdgdaEhA+iWl1AC/p9ikx5Lmxupjb3zEXwOwav5pXeGFu/i1uQdRtu2CBnIi7j7vIXJ+0+JkKDrtuikSysRrZuAkIPGGIXa2KOvhm+tzKtliPPcIGhgwSePz0mjUO5L7zzmcZMHoTM00cmhmTJXLHXXVL0wJj4s1MzRHFFiZHJnI5xbqYKxtqajjQWsuDBeCnFPf3bjFXVC0XXPfJZnZvcUOvlJ5TfVc9np7+YKcF8Pr101cACqIsDSQrhevDLMRutoELrdyRd4yc4EBhnWVGVUo4LsLWMYimrKjHNShUXacMGzWd1rteL0aqM9Wd9vU8jWwVgD0CDq0ypYdiu5V1wDsEFjDwLXJ6pe46MvOgOONLlAwPQwQmNUX+2AdnCCSJdjtaAefC8AY7bANwtVktFIQWVBQ95dSmjz8VnKFc5xsXgOQl3TQHPvghbPELlyOR3/IjaKbR4oXeqF4EjmEktr0SghMIXS60jhlBQIfEIJnyehMgiETwigxDpiHows1RgnEalhk2EzYwRLmRwajUmIaCFSzCXWStGaaJgaMaFOidK9crUyN2ZuYmDCMxbjQvOVrOaRTDXXVeCjhum+v9g5xzwDtdCQ0k+kA7IgR/IB4DE2B6gEv0Dv6l1YUCwQl4cgIQLDp7+vyQ0Ua6AogR/cA0tRku3sTszsBxdKvDwb0HSuapgWAtRzrmM+GLTWgg8og8IOyt6ZvFLTvQ6TdIU4jAZ9qJLorPPx8ToMIzve9bunjAzUZTwZAuejvlIVhEDGHZ43P+c2vnuH0s6xLjGN5IxE0xoW1w0CkEhDEzZIIIKKKJQkS+HFVRzrtPvD4ASgRgCszCJ7egCW+IZ1AZrFQIbETEL8gYz6s0SYtQwYi6Qsmdq1IQVCNcDQEDNHPNnw9vKmss525+DcQrAWHAQARzWHlAGPJFvL0qtVnM2mDSOxfDb56lUUmGI9SmNfCBxBRJtxwA+2eJCOmpSpXLFbYv8diZyMpTv2LEbyMNcTJr20IxsYzUrvRbyu5dvYHUZsRs8gfCLXUEVYi8a2a9PXF+ZtLPx0ZOLRblX8XTa0QJJSoa+VKRIKD5RCmFKYOIiBoFAUCXYIXCCWZKNExSIoiMUmCpS01EkRLAsoE0NCxCz8oQK0iCYNZrgS0sWA4zJgpKMgxYZxIN0k6OoboxHmMgmKyNy3rUrA2BW11g0yU50ArBdUNYm7rW6l+FmQDmsfUcr8Nxpt6ME1pzmPW2YuvyqQA1FEqGKaOFgPS4YwF0qjqJ96aNghQyxO4ETMPCpx6cPhE1xsRksh7qapVjAG7QQVa6blYCqhJolWKylASeNpfutZRkWEfehrAM1hps1M6VN9y+8pnOeOL3eSrvGKkr3kEDbExtsYADtYMAhLoFzWdZo6F3T89cLurlkYDQ8iWVgjINJHQatNc/BZZPPYhX7J3dX5zJTnZ1pJIV4y+k2MF25BTUhIvz2okmED6ax7KgYdJtMkMMjHiBpMVmJIippQbqyHkJreoQDGrZe8QH4qNpIBqEHFpVTrJVwkLCu5ds3+pbccosPAGFjP4J0AB15EXRr4rcAbXmibqr2600yb4dM8VbMHACFOCBZhZIxpWCMkDUZIBUQoKpooWCkAnBzOK5na/LqSSLTATYIaabQCteZkFlqs0bDPpuWAcNiRn6GWSnwrsatNVFIK0+WUGVX3p1UghXmamW9amFzoPHfP2Z3WLhW9ZEaq0DQiqOJyRC17MYwQA84eUDjyR/GOBNpNoO1pV6NwwsBZoAgBWz+M+YS5GC+Su1IEB0A5in0LwPQxXq7joeDPBdd3DzF6z96RTojxR29u8vE3GnO6jAa0MBmCuoxyYl/SDsbSpYIlMINttOUZndGWJ2JgBs8s7bw1GhnALOxFBnZayRRjt4bSvH+Ma9WNZSaKBoUDtDEQNIMt5XAZJIvEFZSahWUgL7ADIBAjZYJVAK8NHljSCRbLZdxbuCkFfrZVirL+GkBWYaJFCoglTaEWtiguhCVZNjj+c9eMUMbOVJQmcHOmKmRIKboAMkAbohUflNANgubKuhTXDGSlSKY0PetmdL+7bQoIJCVRY+osfasgH1NADQYBBoYd+dccoSIhapDyYkRkhkYGAZDWCMlJReDHnRJZKAxUYiJmPGYriVoGAkdW2QI785BQQakRBFiFEknMOMGpw8jj8a7sLaWrGrZ5gDnB2Ys6AFHfczh5BvVw8R6n1P4QHEbDeIf/i7kinChIP/Mpng="
kernels = Kernel(
bz2.decompress(base64.b64decode(quantization_code)),
[
"int4_to_fp16",
"fp16_to_int4",
"int8_to_fp16",
"fp16_to_int8",
"int4_to_bf16",
"bf16_to_int4",
"int8_to_bf16",
"bf16_to_int8",
],
)
except Exception as exception:
kernels = None
logger.warning("Failed to load kernels:" + str(exception))
def quant4(weight: torch.Tensor, scale: torch.Tensor):
stream = torch.cuda.current_stream()
num_row = weight.size(0)
num_chan_fp16 = weight.size(1)
# 4bit
num_chan_int = num_chan_fp16 // 8
qweight = torch.zeros((num_row, num_chan_int), dtype=torch.int32, device=weight.device)
intweight = torch.empty(num_row, num_chan_fp16, dtype = torch.int32)
intweight = torch.clip(torch.round(weight.to(scale.dtype) / scale[:, None]),-16, 15).to(dtype=torch.int32)
for j in range(num_chan_int):
qweight[:, j] = ((intweight[:, j*8+7] & 0x0f) << 28) \
| ((intweight[:, j*8+6] & 0x0f) << 24) \
| ((intweight[:, j*8+5] & 0x0f) << 20) \
| ((intweight[:, j*8+4] & 0x0f) << 16) \
| ((intweight[:, j*8+3] & 0x0f) << 12) \
| ((intweight[:, j*8+2] & 0x0f) << 8) \
| ((intweight[:, j*8+1] & 0x0f) << 4) \
| ((intweight[:, j*8] & 0x0f))
return qweight
def dequant4(qweight: torch.Tensor, scale: torch.Tensor, input: torch.Tensor):
stream = torch.cuda.current_stream()
num_row = qweight.size(0)
num_chan_int = qweight.size(1)
# 4bit
num_chan_fp16 = num_chan_int * 8
out = torch.empty((num_row, num_chan_fp16), dtype=input.dtype, device=qweight.device)
blockDim = (128, 1, 1)
gridDim = ((num_chan_int + blockDim[0] - 1) // blockDim[0], num_row, 1)
if input.dtype == torch.bfloat16:
kernels.int4_to_bf16(
gridDim,
blockDim,
0,
stream,
[ctypes.c_void_p(out.data_ptr()), ctypes.c_void_p(qweight.data_ptr()),
ctypes.c_void_p(scale.data_ptr()), ctypes.c_int32(num_row), ctypes.c_int32(num_chan_int), ctypes.c_int32(num_chan_fp16)],
)
elif input.dtype == torch.float16:
kernels.int4_to_fp16(
gridDim,
blockDim,
0,
stream,
[ctypes.c_void_p(out.data_ptr()), ctypes.c_void_p(qweight.data_ptr()),
ctypes.c_void_p(scale.data_ptr()), ctypes.c_int32(num_row), ctypes.c_int32(num_chan_int), ctypes.c_int32(num_chan_fp16)],
)
return out
class QLinear(torch.nn.Module):
def __init__(self, bits: int, weight: torch.Tensor, bias=None):
super().__init__()
self.quant_bits = bits
self.scale = weight.abs().max(dim=-1).values / ((2 ** (bits - 1)) - 1)
self.scale = self.scale.to(torch.float32)
if self.quant_bits == 4:
self.weight = quant4(weight, self.scale)
elif self.quant_bits == 8:
self.weight = torch.round(weight.to(self.scale.dtype) / self.scale[:, None]).to(torch.int8)
if self.quant_bits == 8:
self.weight = self.weight.T
self.bias = None
def forward(self, input):
if self.quant_bits == 4:
assert(input.dtype == torch.bfloat16 or input.dtype == torch.float16)
if self.weight.device != input.device:
self.weight = self.weight.to(input.device)
self.scale = self.scale.to(input.device)
if self.quant_bits == 4:
self.scale = self.scale.to(input.dtype)
rweight = dequant4(self.weight, self.scale, input).T
output = torch.matmul(input, rweight)
elif self.quant_bits == 8:
rweight = self.weight.to(input.dtype) * self.scale.to(input.dtype)
output = torch.matmul(input, rweight)
if self.bias is not None:
output = output + self.bias
return output
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
}
}
# Copyright (c) 2023, Baichuan Intelligent Technology. All rights reserved.
import os
from shutil import copyfile
from typing import Any, Dict, List, Optional, Tuple
import sentencepiece as spm
from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
from transformers.utils import logging
logger = logging.get_logger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "tokenizer.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {},
"tokenizer_file": {},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}
class BaichuanTokenizer(PreTrainedTokenizer):
"""
Construct a Baichuan tokenizer. Based on byte-level Byte-Pair-Encoding.
Args:
vocab_file (`str`):
Path to the vocabulary file.
"""
vocab_files_names = VOCAB_FILES_NAMES
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
model_input_names = ["input_ids", "attention_mask"]
def __init__(
self,
vocab_file,
unk_token="<unk>",
bos_token="<s>",
eos_token="</s>",
pad_token=None,
sp_model_kwargs: Optional[Dict[str, Any]] = None,
add_bos_token=True,
add_eos_token=False,
clean_up_tokenization_spaces=False,
**kwargs,
):
self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token
eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token
unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token
pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token
self.vocab_file = vocab_file
self.add_bos_token = add_bos_token
self.add_eos_token = add_eos_token
self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
self.sp_model.Load(vocab_file)
super().__init__(
bos_token=bos_token,
eos_token=eos_token,
unk_token=unk_token,
pad_token=pad_token,
add_bos_token=add_bos_token,
add_eos_token=add_eos_token,
sp_model_kwargs=self.sp_model_kwargs,
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
**kwargs,
)
def __getstate__(self):
state = self.__dict__.copy()
state["sp_model"] = None
return state
def __setstate__(self, d):
self.__dict__ = d
self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
self.sp_model.Load(self.vocab_file)
@property
def vocab_size(self):
"""Returns vocab size"""
return self.sp_model.get_piece_size()
def get_vocab(self):
"""Returns vocab as a dict"""
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
vocab.update(self.added_tokens_encoder)
return vocab
def _tokenize(self, text):
"""Returns a tokenized string."""
return self.sp_model.encode(text, out_type=str)
def _convert_token_to_id(self, token):
"""Converts a token (str) in an id using the vocab."""
return self.sp_model.piece_to_id(token)
def _convert_id_to_token(self, index):
"""Converts an index (integer) in a token (str) using the vocab."""
token = self.sp_model.IdToPiece(index)
return token
def convert_tokens_to_string(self, tokens):
"""Converts a sequence of tokens (string) in a single string."""
current_sub_tokens = []
out_string = ""
prev_is_special = False
for i, token in enumerate(tokens):
# make sure that special tokens are not decoded using sentencepiece model
if token in self.all_special_tokens:
if not prev_is_special and i != 0:
out_string += " "
out_string += self.sp_model.decode(current_sub_tokens) + token
prev_is_special = True
current_sub_tokens = []
else:
current_sub_tokens.append(token)
prev_is_special = False
out_string += self.sp_model.decode(current_sub_tokens)
return out_string
def save_vocabulary(self, save_directory, filename_prefix: Optional[str] = None) -> Tuple[str]:
"""
Save the vocabulary and special tokens file to a directory.
Args:
save_directory (`str`):
The directory in which to save the vocabulary.
Returns:
`Tuple(str)`: Paths to the files saved.
"""
if not os.path.isdir(save_directory):
logger.error(f"Vocabulary path ({save_directory}) should be a directory")
return
out_vocab_file = os.path.join(
save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
)
if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
copyfile(self.vocab_file, out_vocab_file)
elif not os.path.isfile(self.vocab_file):
with open(out_vocab_file, "wb") as fi:
content_spiece_model = self.sp_model.serialized_model_proto()
fi.write(content_spiece_model)
return (out_vocab_file,)
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
bos_token_id = [self.bos_token_id] if self.add_bos_token else []
eos_token_id = [self.eos_token_id] if self.add_eos_token else []
output = bos_token_id + token_ids_0 + eos_token_id
if token_ids_1 is not None:
output = output + bos_token_id + token_ids_1 + eos_token_id
return output
def get_special_tokens_mask(
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
) -> List[int]:
"""
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer `prepare_for_model` method.
Args:
token_ids_0 (`List[int]`):
List of IDs.
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model.
Returns:
`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
"""
if already_has_special_tokens:
return super().get_special_tokens_mask(
token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
)
bos_token_id = [1] if self.add_bos_token else []
eos_token_id = [1] if self.add_eos_token else []
if token_ids_1 is None:
return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
return (
bos_token_id
+ ([0] * len(token_ids_0))
+ eos_token_id
+ bos_token_id
+ ([0] * len(token_ids_1))
+ eos_token_id
)
def create_token_type_ids_from_sequences(
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""
Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT
sequence pair mask has the following format:
```
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence | second sequence |
```
if token_ids_1 is None, only returns the first portion of the mask (0s).
Args:
token_ids_0 (`List[int]`):
List of ids.
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
Returns:
`List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
"""
bos_token_id = [self.bos_token_id] if self.add_bos_token else []
eos_token_id = [self.eos_token_id] if self.add_eos_token else []
output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
if token_ids_1 is not None:
output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
return output
{
"add_bos_token": false,
"add_eos_token": false,
"auto_map": {
"AutoTokenizer": [
"tokenization_baichuan.BaichuanTokenizer",
null
]
},
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
},
"model_max_length": 4096,
"pad_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
},
"padding_side": "right",
"sp_model_kwargs": {},
"split_special_tokens": false,
"tokenizer_class": "BaichuanTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": true
}
}
{
"epoch": 2.0,
"train_loss": 0.5708327819994277,
"train_runtime": 105327.89,
"train_samples_per_second": 4.797,
"train_steps_per_second": 0.019
}
\ No newline at end of file
{"current_steps": 10, "total_steps": 1974, "loss": 0.9175, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.9996834033646177e-05, "epoch": 0.01, "percentage": 0.51, "elapsed_time": "0:08:53", "remaining_time": "1 day, 5:06:00"}
{"current_steps": 20, "total_steps": 1974, "loss": 0.7595, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.998733693645213e-05, "epoch": 0.02, "percentage": 1.01, "elapsed_time": "0:17:16", "remaining_time": "1 day, 4:07:31"}
{"current_steps": 30, "total_steps": 1974, "loss": 0.7375, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.997151111381707e-05, "epoch": 0.03, "percentage": 1.52, "elapsed_time": "0:25:37", "remaining_time": "1 day, 3:40:03"}
{"current_steps": 40, "total_steps": 1974, "loss": 0.7227, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.9949360574062774e-05, "epoch": 0.04, "percentage": 2.03, "elapsed_time": "0:33:53", "remaining_time": "1 day, 3:18:52"}
{"current_steps": 50, "total_steps": 1974, "loss": 0.7147, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.9920890927418316e-05, "epoch": 0.05, "percentage": 2.53, "elapsed_time": "0:42:08", "remaining_time": "1 day, 3:01:22"}
{"current_steps": 60, "total_steps": 1974, "loss": 0.7102, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.988610938459917e-05, "epoch": 0.06, "percentage": 3.04, "elapsed_time": "0:50:24", "remaining_time": "1 day, 2:47:46"}
{"current_steps": 70, "total_steps": 1974, "loss": 0.7056, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.9845024754980876e-05, "epoch": 0.07, "percentage": 3.55, "elapsed_time": "0:58:37", "remaining_time": "1 day, 2:34:35"}
{"current_steps": 80, "total_steps": 1974, "loss": 0.7128, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.979764744436784e-05, "epoch": 0.08, "percentage": 4.05, "elapsed_time": "1:06:53", "remaining_time": "1 day, 2:23:46"}
{"current_steps": 90, "total_steps": 1974, "loss": 0.6982, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.9743989452357756e-05, "epoch": 0.09, "percentage": 4.56, "elapsed_time": "1:15:09", "remaining_time": "1 day, 2:13:11"}
{"current_steps": 100, "total_steps": 1974, "loss": 0.7258, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.968406436930243e-05, "epoch": 0.1, "percentage": 5.07, "elapsed_time": "1:23:27", "remaining_time": "1 day, 2:04:01"}
{"current_steps": 110, "total_steps": 1974, "loss": 0.7095, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.961788737286559e-05, "epoch": 0.11, "percentage": 5.57, "elapsed_time": "1:31:45", "remaining_time": "1 day, 1:54:50"}
{"current_steps": 120, "total_steps": 1974, "loss": 0.7048, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.954547522417877e-05, "epoch": 0.12, "percentage": 6.08, "elapsed_time": "1:40:03", "remaining_time": "1 day, 1:45:50"}
{"current_steps": 130, "total_steps": 1974, "loss": 0.6805, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.946684626359607e-05, "epoch": 0.13, "percentage": 6.59, "elapsed_time": "1:48:21", "remaining_time": "1 day, 1:36:59"}
{"current_steps": 140, "total_steps": 1974, "loss": 0.6798, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.938202040604898e-05, "epoch": 0.14, "percentage": 7.09, "elapsed_time": "1:56:36", "remaining_time": "1 day, 1:27:37"}
{"current_steps": 150, "total_steps": 1974, "loss": 0.7134, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.929101913600238e-05, "epoch": 0.15, "percentage": 7.6, "elapsed_time": "2:04:53", "remaining_time": "1 day, 1:18:36"}
{"current_steps": 160, "total_steps": 1974, "loss": 0.6895, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.919386550201299e-05, "epoch": 0.16, "percentage": 8.11, "elapsed_time": "2:13:08", "remaining_time": "1 day, 1:09:32"}
{"current_steps": 170, "total_steps": 1974, "loss": 0.705, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.909058411089174e-05, "epoch": 0.17, "percentage": 8.61, "elapsed_time": "2:21:25", "remaining_time": "1 day, 1:00:44"}
{"current_steps": 180, "total_steps": 1974, "loss": 0.6712, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.8981201121471356e-05, "epoch": 0.18, "percentage": 9.12, "elapsed_time": "2:29:41", "remaining_time": "1 day, 0:51:58"}
{"current_steps": 190, "total_steps": 1974, "loss": 0.6744, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.886574423798097e-05, "epoch": 0.19, "percentage": 9.63, "elapsed_time": "2:37:59", "remaining_time": "1 day, 0:43:26"}
{"current_steps": 200, "total_steps": 1974, "loss": 0.6675, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.874424270302927e-05, "epoch": 0.2, "percentage": 10.13, "elapsed_time": "2:46:13", "remaining_time": "1 day, 0:34:29"}
{"current_steps": 210, "total_steps": 1974, "loss": 0.6726, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.861672729019797e-05, "epoch": 0.21, "percentage": 10.64, "elapsed_time": "3:06:14", "remaining_time": "1 day, 2:04:28"}
{"current_steps": 220, "total_steps": 1974, "loss": 0.6821, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.848323029624761e-05, "epoch": 0.22, "percentage": 11.14, "elapsed_time": "3:14:32", "remaining_time": "1 day, 1:50:57"}
{"current_steps": 230, "total_steps": 1974, "loss": 0.7133, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.834378553293748e-05, "epoch": 0.23, "percentage": 11.65, "elapsed_time": "3:22:48", "remaining_time": "1 day, 1:37:50"}
{"current_steps": 240, "total_steps": 1974, "loss": 0.6681, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.81984283184619e-05, "epoch": 0.24, "percentage": 12.16, "elapsed_time": "3:31:01", "remaining_time": "1 day, 1:24:41"}
{"current_steps": 250, "total_steps": 1974, "loss": 0.682, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.804719546850487e-05, "epoch": 0.25, "percentage": 12.66, "elapsed_time": "3:39:18", "remaining_time": "1 day, 1:12:19"}
{"current_steps": 260, "total_steps": 1974, "loss": 0.6755, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.789012528691558e-05, "epoch": 0.26, "percentage": 13.17, "elapsed_time": "3:47:36", "remaining_time": "1 day, 1:00:28"}
{"current_steps": 270, "total_steps": 1974, "loss": 0.68, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.772725755600682e-05, "epoch": 0.27, "percentage": 13.68, "elapsed_time": "3:55:51", "remaining_time": "1 day, 0:48:32"}
{"current_steps": 280, "total_steps": 1974, "loss": 0.6663, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.755863352647909e-05, "epoch": 0.28, "percentage": 14.18, "elapsed_time": "4:04:07", "remaining_time": "1 day, 0:36:58"}
{"current_steps": 290, "total_steps": 1974, "loss": 0.6672, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.738429590697271e-05, "epoch": 0.29, "percentage": 14.69, "elapsed_time": "4:12:21", "remaining_time": "1 day, 0:25:24"}
{"current_steps": 300, "total_steps": 1974, "loss": 0.6466, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.720428885325069e-05, "epoch": 0.3, "percentage": 15.2, "elapsed_time": "4:20:38", "remaining_time": "1 day, 0:14:20"}
{"current_steps": 310, "total_steps": 1974, "loss": 0.6668, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.701865795701505e-05, "epoch": 0.31, "percentage": 15.7, "elapsed_time": "4:28:57", "remaining_time": "1 day, 0:03:40"}
{"current_steps": 320, "total_steps": 1974, "loss": 0.6591, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.684682059461469e-05, "epoch": 0.32, "percentage": 16.21, "elapsed_time": "4:37:10", "remaining_time": "23:52:39"}
{"current_steps": 330, "total_steps": 1974, "loss": 0.661, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.665063509461097e-05, "epoch": 0.33, "percentage": 16.72, "elapsed_time": "4:45:27", "remaining_time": "23:42:06"}
{"current_steps": 340, "total_steps": 1974, "loss": 0.6749, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.644896598002736e-05, "epoch": 0.34, "percentage": 17.22, "elapsed_time": "4:53:43", "remaining_time": "23:31:35"}
{"current_steps": 350, "total_steps": 1974, "loss": 0.6654, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.624186432907437e-05, "epoch": 0.35, "percentage": 17.73, "elapsed_time": "5:01:59", "remaining_time": "23:21:16"}
{"current_steps": 360, "total_steps": 1974, "loss": 0.6716, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.602938259590072e-05, "epoch": 0.36, "percentage": 18.24, "elapsed_time": "5:10:16", "remaining_time": "23:11:05"}
{"current_steps": 370, "total_steps": 1974, "loss": 0.6796, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.581157459730783e-05, "epoch": 0.37, "percentage": 18.74, "elapsed_time": "5:18:35", "remaining_time": "23:01:09"}
{"current_steps": 380, "total_steps": 1974, "loss": 0.6794, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.558849549911931e-05, "epoch": 0.39, "percentage": 19.25, "elapsed_time": "5:26:55", "remaining_time": "22:51:20"}
{"current_steps": 390, "total_steps": 1974, "loss": 0.6651, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.536020180220871e-05, "epoch": 0.4, "percentage": 19.76, "elapsed_time": "5:35:12", "remaining_time": "22:41:26"}
{"current_steps": 400, "total_steps": 1974, "loss": 0.6823, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.515032675559024e-05, "epoch": 0.41, "percentage": 20.26, "elapsed_time": "5:43:27", "remaining_time": "22:31:30"}
{"current_steps": 410, "total_steps": 1974, "loss": 0.6677, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.4912285699446786e-05, "epoch": 0.42, "percentage": 20.77, "elapsed_time": "6:04:58", "remaining_time": "23:12:15"}
{"current_steps": 420, "total_steps": 1974, "loss": 0.6704, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.4669201313179155e-05, "epoch": 0.43, "percentage": 21.28, "elapsed_time": "6:13:18", "remaining_time": "23:01:12"}
{"current_steps": 430, "total_steps": 1974, "loss": 0.6481, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.442113516454638e-05, "epoch": 0.44, "percentage": 21.78, "elapsed_time": "6:21:37", "remaining_time": "22:50:18"}
{"current_steps": 440, "total_steps": 1974, "loss": 0.665, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.416815008307488e-05, "epoch": 0.45, "percentage": 22.29, "elapsed_time": "6:29:58", "remaining_time": "22:39:35"}
{"current_steps": 450, "total_steps": 1974, "loss": 0.6658, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.391031014414514e-05, "epoch": 0.46, "percentage": 22.8, "elapsed_time": "6:38:16", "remaining_time": "22:28:49"}
{"current_steps": 460, "total_steps": 1974, "loss": 0.6699, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.364768065276284e-05, "epoch": 0.47, "percentage": 23.3, "elapsed_time": "6:46:39", "remaining_time": "22:18:26"}
{"current_steps": 470, "total_steps": 1974, "loss": 0.6664, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.338032812701867e-05, "epoch": 0.48, "percentage": 23.81, "elapsed_time": "6:55:02", "remaining_time": "22:08:08"}
{"current_steps": 480, "total_steps": 1974, "loss": 0.6817, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.310832028124069e-05, "epoch": 0.49, "percentage": 24.32, "elapsed_time": "7:03:24", "remaining_time": "21:57:52"}
{"current_steps": 490, "total_steps": 1974, "loss": 0.6791, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.283172600884393e-05, "epoch": 0.5, "percentage": 24.82, "elapsed_time": "7:11:47", "remaining_time": "21:47:43"}
{"current_steps": 500, "total_steps": 1974, "loss": 0.6679, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.2550615364881194e-05, "epoch": 0.51, "percentage": 25.33, "elapsed_time": "7:20:10", "remaining_time": "21:37:38"}
{"current_steps": 510, "total_steps": 1974, "loss": 0.6476, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.226505954829973e-05, "epoch": 0.52, "percentage": 25.84, "elapsed_time": "7:28:32", "remaining_time": "21:27:33"}
{"current_steps": 520, "total_steps": 1974, "loss": 0.6721, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.197513088390813e-05, "epoch": 0.53, "percentage": 26.34, "elapsed_time": "7:36:52", "remaining_time": "21:17:30"}
{"current_steps": 530, "total_steps": 1974, "loss": 0.6563, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.1680902804058095e-05, "epoch": 0.54, "percentage": 26.85, "elapsed_time": "7:45:10", "remaining_time": "21:07:22"}
{"current_steps": 540, "total_steps": 1974, "loss": 0.6391, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.138244983004574e-05, "epoch": 0.55, "percentage": 27.36, "elapsed_time": "7:53:26", "remaining_time": "20:57:14"}
{"current_steps": 550, "total_steps": 1974, "loss": 0.6672, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.107984755323697e-05, "epoch": 0.56, "percentage": 27.86, "elapsed_time": "8:01:42", "remaining_time": "20:47:11"}
{"current_steps": 560, "total_steps": 1974, "loss": 0.6497, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.077317261592194e-05, "epoch": 0.57, "percentage": 28.37, "elapsed_time": "8:09:57", "remaining_time": "20:37:08"}
{"current_steps": 570, "total_steps": 1974, "loss": 0.668, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.04625026919033e-05, "epoch": 0.58, "percentage": 28.88, "elapsed_time": "8:18:10", "remaining_time": "20:27:04"}
{"current_steps": 580, "total_steps": 1974, "loss": 0.6682, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.0147916466823174e-05, "epoch": 0.59, "percentage": 29.38, "elapsed_time": "8:26:24", "remaining_time": "20:17:07"}
{"current_steps": 590, "total_steps": 1974, "loss": 0.6352, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.982949361823388e-05, "epoch": 0.6, "percentage": 29.89, "elapsed_time": "8:34:38", "remaining_time": "20:07:14"}
{"current_steps": 600, "total_steps": 1974, "loss": 0.6698, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.950731479541743e-05, "epoch": 0.61, "percentage": 30.4, "elapsed_time": "8:42:55", "remaining_time": "19:57:30"}
{"current_steps": 610, "total_steps": 1974, "loss": 0.6549, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.918146159895882e-05, "epoch": 0.62, "percentage": 30.9, "elapsed_time": "9:01:46", "remaining_time": "20:11:25"}
{"current_steps": 620, "total_steps": 1974, "loss": 0.6516, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.8852016560078605e-05, "epoch": 0.63, "percentage": 31.41, "elapsed_time": "9:10:12", "remaining_time": "20:01:35"}
{"current_steps": 630, "total_steps": 1974, "loss": 0.6629, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.851906311972943e-05, "epoch": 0.64, "percentage": 31.91, "elapsed_time": "9:18:37", "remaining_time": "19:51:44"}
{"current_steps": 640, "total_steps": 1974, "loss": 0.6764, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.821647502051616e-05, "epoch": 0.65, "percentage": 32.42, "elapsed_time": "9:26:58", "remaining_time": "19:41:46"}
{"current_steps": 650, "total_steps": 1974, "loss": 0.6415, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.787708866250794e-05, "epoch": 0.66, "percentage": 32.93, "elapsed_time": "9:35:20", "remaining_time": "19:31:54"}
{"current_steps": 660, "total_steps": 1974, "loss": 0.6463, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.7534440830144466e-05, "epoch": 0.67, "percentage": 33.43, "elapsed_time": "9:43:42", "remaining_time": "19:22:06"}
{"current_steps": 670, "total_steps": 1974, "loss": 0.6508, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.71886183083464e-05, "epoch": 0.68, "percentage": 33.94, "elapsed_time": "9:52:04", "remaining_time": "19:12:19"}
{"current_steps": 680, "total_steps": 1974, "loss": 0.6411, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.683970868611123e-05, "epoch": 0.69, "percentage": 34.45, "elapsed_time": "10:00:24", "remaining_time": "19:02:31"}
{"current_steps": 690, "total_steps": 1974, "loss": 0.6266, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.648780033432891e-05, "epoch": 0.7, "percentage": 34.95, "elapsed_time": "10:08:43", "remaining_time": "18:52:44"}
{"current_steps": 700, "total_steps": 1974, "loss": 0.6409, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.613298238339955e-05, "epoch": 0.71, "percentage": 35.46, "elapsed_time": "10:17:01", "remaining_time": "18:42:58"}
{"current_steps": 710, "total_steps": 1974, "loss": 0.6594, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.5775344700658705e-05, "epoch": 0.72, "percentage": 35.97, "elapsed_time": "10:25:18", "remaining_time": "18:33:13"}
{"current_steps": 720, "total_steps": 1974, "loss": 0.6427, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.5414977867616006e-05, "epoch": 0.73, "percentage": 36.47, "elapsed_time": "10:33:40", "remaining_time": "18:23:38"}
{"current_steps": 730, "total_steps": 1974, "loss": 0.6462, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.505197315701292e-05, "epoch": 0.74, "percentage": 36.98, "elapsed_time": "10:41:59", "remaining_time": "18:14:01"}
{"current_steps": 740, "total_steps": 1974, "loss": 0.6277, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.468642250970547e-05, "epoch": 0.75, "percentage": 37.49, "elapsed_time": "10:50:18", "remaining_time": "18:04:25"}
{"current_steps": 750, "total_steps": 1974, "loss": 0.6551, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.431841851137764e-05, "epoch": 0.76, "percentage": 37.99, "elapsed_time": "10:58:35", "remaining_time": "17:54:48"}
{"current_steps": 760, "total_steps": 1974, "loss": 0.6402, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.394805436909157e-05, "epoch": 0.77, "percentage": 38.5, "elapsed_time": "11:06:52", "remaining_time": "17:45:15"}
{"current_steps": 770, "total_steps": 1974, "loss": 0.6515, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.357542388768033e-05, "epoch": 0.78, "percentage": 39.01, "elapsed_time": "11:15:12", "remaining_time": "17:35:47"}
{"current_steps": 780, "total_steps": 1974, "loss": 0.6489, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.3200621445989226e-05, "epoch": 0.79, "percentage": 39.51, "elapsed_time": "11:23:30", "remaining_time": "17:26:18"}
{"current_steps": 790, "total_steps": 1974, "loss": 0.6568, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.282374197297185e-05, "epoch": 0.8, "percentage": 40.02, "elapsed_time": "11:31:50", "remaining_time": "17:16:53"}
{"current_steps": 800, "total_steps": 1974, "loss": 0.615, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.2444880923646674e-05, "epoch": 0.81, "percentage": 40.53, "elapsed_time": "11:40:07", "remaining_time": "17:07:26"}
{"current_steps": 810, "total_steps": 1974, "loss": 0.6447, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.20641342549205e-05, "epoch": 0.82, "percentage": 41.03, "elapsed_time": "12:02:31", "remaining_time": "17:18:17"}
{"current_steps": 820, "total_steps": 1974, "loss": 0.6159, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.168159840128472e-05, "epoch": 0.83, "percentage": 41.54, "elapsed_time": "12:10:54", "remaining_time": "17:08:37"}
{"current_steps": 830, "total_steps": 1974, "loss": 0.6347, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.129737025039068e-05, "epoch": 0.84, "percentage": 42.05, "elapsed_time": "12:19:18", "remaining_time": "16:58:59"}
{"current_steps": 840, "total_steps": 1974, "loss": 0.6361, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.091154711851022e-05, "epoch": 0.85, "percentage": 42.55, "elapsed_time": "12:27:42", "remaining_time": "16:49:24"}
{"current_steps": 850, "total_steps": 1974, "loss": 0.6504, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.052422672588765e-05, "epoch": 0.86, "percentage": 43.06, "elapsed_time": "12:36:05", "remaining_time": "16:39:48"}
{"current_steps": 860, "total_steps": 1974, "loss": 0.6467, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.013550717198948e-05, "epoch": 0.87, "percentage": 43.57, "elapsed_time": "12:44:27", "remaining_time": "16:30:14"}
{"current_steps": 870, "total_steps": 1974, "loss": 0.6364, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.9745486910657993e-05, "epoch": 0.88, "percentage": 44.07, "elapsed_time": "12:52:51", "remaining_time": "16:20:44"}
{"current_steps": 880, "total_steps": 1974, "loss": 0.6361, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.9354264725175185e-05, "epoch": 0.89, "percentage": 44.58, "elapsed_time": "13:01:16", "remaining_time": "16:11:15"}
{"current_steps": 890, "total_steps": 1974, "loss": 0.6441, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.8961939703243122e-05, "epoch": 0.9, "percentage": 45.09, "elapsed_time": "13:09:39", "remaining_time": "16:01:47"}
{"current_steps": 900, "total_steps": 1974, "loss": 0.6404, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.856861121188735e-05, "epoch": 0.91, "percentage": 45.59, "elapsed_time": "13:18:01", "remaining_time": "15:52:19"}
{"current_steps": 910, "total_steps": 1974, "loss": 0.6307, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.8174378872289446e-05, "epoch": 0.92, "percentage": 46.1, "elapsed_time": "13:26:21", "remaining_time": "15:42:49"}
{"current_steps": 920, "total_steps": 1974, "loss": 0.6484, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.777934253455522e-05, "epoch": 0.93, "percentage": 46.61, "elapsed_time": "13:34:42", "remaining_time": "15:33:21"}
{"current_steps": 930, "total_steps": 1974, "loss": 0.6237, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.7383602252424985e-05, "epoch": 0.94, "percentage": 47.11, "elapsed_time": "13:43:00", "remaining_time": "15:23:53"}
{"current_steps": 940, "total_steps": 1974, "loss": 0.6161, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.6987258257932175e-05, "epoch": 0.95, "percentage": 47.62, "elapsed_time": "13:51:20", "remaining_time": "15:14:29"}
{"current_steps": 950, "total_steps": 1974, "loss": 0.6381, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.6590410936016895e-05, "epoch": 0.96, "percentage": 48.13, "elapsed_time": "13:59:40", "remaining_time": "15:05:05"}
{"current_steps": 960, "total_steps": 1974, "loss": 0.6366, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.619316079910063e-05, "epoch": 0.97, "percentage": 48.63, "elapsed_time": "14:07:59", "remaining_time": "14:55:41"}
{"current_steps": 970, "total_steps": 1974, "loss": 0.6202, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.5795608461628802e-05, "epoch": 0.98, "percentage": 49.14, "elapsed_time": "14:16:20", "remaining_time": "14:46:21"}
{"current_steps": 980, "total_steps": 1974, "loss": 0.6334, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.5397854614587334e-05, "epoch": 0.99, "percentage": 49.65, "elapsed_time": "14:24:38", "remaining_time": "14:36:59"}
{"current_steps": 990, "total_steps": 1974, "loss": 0.5954, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.5e-05, "epoch": 1.0, "percentage": 50.15, "elapsed_time": "14:32:58", "remaining_time": "14:27:41"}
{"current_steps": 1000, "total_steps": 1974, "loss": 0.4963, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.460214538541267e-05, "epoch": 1.01, "percentage": 50.66, "elapsed_time": "14:41:17", "remaining_time": "14:18:23"}
{"current_steps": 1010, "total_steps": 1974, "loss": 0.487, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.4204391538371207e-05, "epoch": 1.02, "percentage": 51.17, "elapsed_time": "15:03:28", "remaining_time": "14:22:20"}
{"current_steps": 1020, "total_steps": 1974, "loss": 0.489, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.3806839200899377e-05, "epoch": 1.03, "percentage": 51.67, "elapsed_time": "15:11:48", "remaining_time": "14:12:48"}
{"current_steps": 1030, "total_steps": 1974, "loss": 0.4805, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.3409589063983117e-05, "epoch": 1.04, "percentage": 52.18, "elapsed_time": "15:20:06", "remaining_time": "14:03:17"}
{"current_steps": 1040, "total_steps": 1974, "loss": 0.4907, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.3012741742067838e-05, "epoch": 1.05, "percentage": 52.68, "elapsed_time": "15:28:27", "remaining_time": "13:53:49"}
{"current_steps": 1050, "total_steps": 1974, "loss": 0.4719, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.261639774757503e-05, "epoch": 1.06, "percentage": 53.19, "elapsed_time": "15:36:47", "remaining_time": "13:44:22"}
{"current_steps": 1060, "total_steps": 1974, "loss": 0.4914, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.2220657465444782e-05, "epoch": 1.07, "percentage": 53.7, "elapsed_time": "15:45:07", "remaining_time": "13:34:56"}
{"current_steps": 1070, "total_steps": 1974, "loss": 0.4775, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.182562112771056e-05, "epoch": 1.08, "percentage": 54.2, "elapsed_time": "15:53:27", "remaining_time": "13:25:32"}
{"current_steps": 1080, "total_steps": 1974, "loss": 0.4935, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.143138878811265e-05, "epoch": 1.09, "percentage": 54.71, "elapsed_time": "16:01:47", "remaining_time": "13:16:09"}
{"current_steps": 1090, "total_steps": 1974, "loss": 0.5082, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.1038060296756883e-05, "epoch": 1.1, "percentage": 55.22, "elapsed_time": "16:10:09", "remaining_time": "13:06:48"}
{"current_steps": 1100, "total_steps": 1974, "loss": 0.4868, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.064573527482482e-05, "epoch": 1.11, "percentage": 55.72, "elapsed_time": "16:18:28", "remaining_time": "12:57:26"}
{"current_steps": 1110, "total_steps": 1974, "loss": 0.4988, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.025451308934201e-05, "epoch": 1.12, "percentage": 56.23, "elapsed_time": "16:26:44", "remaining_time": "12:48:03"}
{"current_steps": 1120, "total_steps": 1974, "loss": 0.4653, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.9864492828010526e-05, "epoch": 1.13, "percentage": 56.74, "elapsed_time": "16:35:02", "remaining_time": "12:38:43"}
{"current_steps": 1130, "total_steps": 1974, "loss": 0.4915, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.9475773274112354e-05, "epoch": 1.14, "percentage": 57.24, "elapsed_time": "16:43:23", "remaining_time": "12:29:25"}
{"current_steps": 1140, "total_steps": 1974, "loss": 0.4763, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.9088452881489787e-05, "epoch": 1.16, "percentage": 57.75, "elapsed_time": "16:51:43", "remaining_time": "12:20:09"}
{"current_steps": 1150, "total_steps": 1974, "loss": 0.4807, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.8702629749609324e-05, "epoch": 1.17, "percentage": 58.26, "elapsed_time": "17:00:05", "remaining_time": "12:10:54"}
{"current_steps": 1160, "total_steps": 1974, "loss": 0.4653, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.8318401598715284e-05, "epoch": 1.18, "percentage": 58.76, "elapsed_time": "17:08:24", "remaining_time": "12:01:39"}
{"current_steps": 1170, "total_steps": 1974, "loss": 0.4778, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.793586574507951e-05, "epoch": 1.19, "percentage": 59.27, "elapsed_time": "17:16:43", "remaining_time": "11:52:24"}
{"current_steps": 1180, "total_steps": 1974, "loss": 0.4839, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7555119076353338e-05, "epoch": 1.2, "percentage": 59.78, "elapsed_time": "17:25:04", "remaining_time": "11:43:12"}
{"current_steps": 1190, "total_steps": 1974, "loss": 0.4718, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7176258027028152e-05, "epoch": 1.21, "percentage": 60.28, "elapsed_time": "17:33:23", "remaining_time": "11:33:59"}
{"current_steps": 1200, "total_steps": 1974, "loss": 0.4793, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.6799378554010773e-05, "epoch": 1.22, "percentage": 60.79, "elapsed_time": "17:41:44", "remaining_time": "11:24:49"}
{"current_steps": 1210, "total_steps": 1974, "loss": 0.4825, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.6424576112319672e-05, "epoch": 1.23, "percentage": 61.3, "elapsed_time": "18:04:33", "remaining_time": "11:24:47"}
{"current_steps": 1220, "total_steps": 1974, "loss": 0.4857, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.6051945630908426e-05, "epoch": 1.24, "percentage": 61.8, "elapsed_time": "18:12:53", "remaining_time": "11:15:26"}
{"current_steps": 1230, "total_steps": 1974, "loss": 0.4802, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.5681581488622367e-05, "epoch": 1.25, "percentage": 62.31, "elapsed_time": "18:21:13", "remaining_time": "11:06:06"}
{"current_steps": 1240, "total_steps": 1974, "loss": 0.4812, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.5313577490294538e-05, "epoch": 1.26, "percentage": 62.82, "elapsed_time": "18:29:36", "remaining_time": "10:56:49"}
{"current_steps": 1250, "total_steps": 1974, "loss": 0.4682, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4948026842987084e-05, "epoch": 1.27, "percentage": 63.32, "elapsed_time": "18:38:01", "remaining_time": "10:47:33"}
{"current_steps": 1260, "total_steps": 1974, "loss": 0.4974, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4585022132384008e-05, "epoch": 1.28, "percentage": 63.83, "elapsed_time": "18:46:25", "remaining_time": "10:38:18"}
{"current_steps": 1270, "total_steps": 1974, "loss": 0.4737, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4224655299341304e-05, "epoch": 1.29, "percentage": 64.34, "elapsed_time": "18:54:44", "remaining_time": "10:29:01"}
{"current_steps": 1280, "total_steps": 1974, "loss": 0.4877, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.3867017616600456e-05, "epoch": 1.3, "percentage": 64.84, "elapsed_time": "19:03:02", "remaining_time": "10:19:44"}
{"current_steps": 1290, "total_steps": 1974, "loss": 0.4753, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.3512199665671094e-05, "epoch": 1.31, "percentage": 65.35, "elapsed_time": "19:11:23", "remaining_time": "10:10:30"}
{"current_steps": 1300, "total_steps": 1974, "loss": 0.4638, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.316029131388878e-05, "epoch": 1.32, "percentage": 65.86, "elapsed_time": "19:19:45", "remaining_time": "10:01:17"}
{"current_steps": 1310, "total_steps": 1974, "loss": 0.4626, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.2811381691653607e-05, "epoch": 1.33, "percentage": 66.36, "elapsed_time": "19:28:04", "remaining_time": "9:52:03"}
{"current_steps": 1320, "total_steps": 1974, "loss": 0.4786, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.2465559169855535e-05, "epoch": 1.34, "percentage": 66.87, "elapsed_time": "19:36:26", "remaining_time": "9:42:52"}
{"current_steps": 1330, "total_steps": 1974, "loss": 0.4717, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.212291133749206e-05, "epoch": 1.35, "percentage": 67.38, "elapsed_time": "19:44:49", "remaining_time": "9:33:42"}
{"current_steps": 1340, "total_steps": 1974, "loss": 0.4803, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.178352497948384e-05, "epoch": 1.36, "percentage": 67.88, "elapsed_time": "19:53:08", "remaining_time": "9:24:31"}
{"current_steps": 1350, "total_steps": 1974, "loss": 0.4803, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.1447486054694112e-05, "epoch": 1.37, "percentage": 68.39, "elapsed_time": "20:01:28", "remaining_time": "9:15:20"}
{"current_steps": 1360, "total_steps": 1974, "loss": 0.4739, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.1114879674157233e-05, "epoch": 1.38, "percentage": 68.9, "elapsed_time": "20:09:47", "remaining_time": "9:06:11"}
{"current_steps": 1370, "total_steps": 1974, "loss": 0.471, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0785790079522001e-05, "epoch": 1.39, "percentage": 69.4, "elapsed_time": "20:18:06", "remaining_time": "8:57:01"}
{"current_steps": 1380, "total_steps": 1974, "loss": 0.4799, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.046030062171512e-05, "epoch": 1.4, "percentage": 69.91, "elapsed_time": "20:26:25", "remaining_time": "8:47:53"}
{"current_steps": 1390, "total_steps": 1974, "loss": 0.4689, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0138493739830352e-05, "epoch": 1.41, "percentage": 70.42, "elapsed_time": "20:34:44", "remaining_time": "8:38:46"}
{"current_steps": 1400, "total_steps": 1974, "loss": 0.4599, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.820450940248544e-06, "epoch": 1.42, "percentage": 70.92, "elapsed_time": "20:43:03", "remaining_time": "8:29:39"}
{"current_steps": 1410, "total_steps": 1974, "loss": 0.5019, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.506252775993882e-06, "epoch": 1.43, "percentage": 71.43, "elapsed_time": "21:01:37", "remaining_time": "8:24:39"}
{"current_steps": 1420, "total_steps": 1974, "loss": 0.4764, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.195978826331697e-06, "epoch": 1.44, "percentage": 71.94, "elapsed_time": "21:10:01", "remaining_time": "8:15:29"}
{"current_steps": 1430, "total_steps": 1974, "loss": 0.4579, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 8.889707676612791e-06, "epoch": 1.45, "percentage": 72.44, "elapsed_time": "21:18:24", "remaining_time": "8:06:19"}
{"current_steps": 1440, "total_steps": 1974, "loss": 0.4592, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 8.587516898369589e-06, "epoch": 1.46, "percentage": 72.95, "elapsed_time": "21:26:47", "remaining_time": "7:57:11"}
{"current_steps": 1450, "total_steps": 1974, "loss": 0.4861, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 8.289483029668972e-06, "epoch": 1.47, "percentage": 73.45, "elapsed_time": "21:35:09", "remaining_time": "7:48:02"}
{"current_steps": 1460, "total_steps": 1974, "loss": 0.4753, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.99568155572701e-06, "epoch": 1.48, "percentage": 73.96, "elapsed_time": "21:43:33", "remaining_time": "7:38:55"}
{"current_steps": 1470, "total_steps": 1974, "loss": 0.4929, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.706186889790209e-06, "epoch": 1.49, "percentage": 74.47, "elapsed_time": "21:51:52", "remaining_time": "7:29:47"}
{"current_steps": 1480, "total_steps": 1974, "loss": 0.4594, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.421072354288302e-06, "epoch": 1.5, "percentage": 74.97, "elapsed_time": "22:00:14", "remaining_time": "7:20:40"}
{"current_steps": 1490, "total_steps": 1974, "loss": 0.4912, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.140410162263414e-06, "epoch": 1.51, "percentage": 75.48, "elapsed_time": "22:08:35", "remaining_time": "7:11:34"}
{"current_steps": 1500, "total_steps": 1974, "loss": 0.4681, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.86427139908008e-06, "epoch": 1.52, "percentage": 75.99, "elapsed_time": "22:16:55", "remaining_time": "7:02:28"}
{"current_steps": 1510, "total_steps": 1974, "loss": 0.4816, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.5927260044209655e-06, "epoch": 1.53, "percentage": 76.49, "elapsed_time": "22:25:16", "remaining_time": "6:53:22"}
{"current_steps": 1520, "total_steps": 1974, "loss": 0.4723, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.3258427545727e-06, "epoch": 1.54, "percentage": 77.0, "elapsed_time": "22:33:37", "remaining_time": "6:44:18"}
{"current_steps": 1530, "total_steps": 1974, "loss": 0.4856, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.063689245006443e-06, "epoch": 1.55, "percentage": 77.51, "elapsed_time": "22:41:56", "remaining_time": "6:35:13"}
{"current_steps": 1540, "total_steps": 1974, "loss": 0.4829, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.806331873257462e-06, "epoch": 1.56, "percentage": 78.01, "elapsed_time": "22:50:16", "remaining_time": "6:26:10"}
{"current_steps": 1550, "total_steps": 1974, "loss": 0.4741, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.553835822108152e-06, "epoch": 1.57, "percentage": 78.52, "elapsed_time": "22:58:38", "remaining_time": "6:17:07"}
{"current_steps": 1560, "total_steps": 1974, "loss": 0.4654, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.306265043078693e-06, "epoch": 1.58, "percentage": 79.03, "elapsed_time": "23:06:58", "remaining_time": "6:08:04"}
{"current_steps": 1570, "total_steps": 1974, "loss": 0.4668, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0636822402296165e-06, "epoch": 1.59, "percentage": 79.53, "elapsed_time": "23:15:15", "remaining_time": "5:59:02"}
{"current_steps": 1580, "total_steps": 1974, "loss": 0.4723, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.826148854280277e-06, "epoch": 1.6, "percentage": 80.04, "elapsed_time": "23:23:36", "remaining_time": "5:50:00"}
{"current_steps": 1590, "total_steps": 1974, "loss": 0.4639, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.593725047047293e-06, "epoch": 1.61, "percentage": 80.55, "elapsed_time": "23:31:55", "remaining_time": "5:40:59"}
{"current_steps": 1600, "total_steps": 1974, "loss": 0.4777, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.3664696862069505e-06, "epoch": 1.62, "percentage": 81.05, "elapsed_time": "23:40:12", "remaining_time": "5:31:58"}
{"current_steps": 1610, "total_steps": 1974, "loss": 0.4546, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.144440330385347e-06, "epoch": 1.63, "percentage": 81.56, "elapsed_time": "1 day, 0:03:58", "remaining_time": "5:26:27"}
{"current_steps": 1620, "total_steps": 1974, "loss": 0.4543, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.927693214580075e-06, "epoch": 1.64, "percentage": 82.07, "elapsed_time": "1 day, 0:12:13", "remaining_time": "5:17:20"}
{"current_steps": 1630, "total_steps": 1974, "loss": 0.4543, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.71628323591722e-06, "epoch": 1.65, "percentage": 82.57, "elapsed_time": "1 day, 0:20:27", "remaining_time": "5:08:13"}
{"current_steps": 1640, "total_steps": 1974, "loss": 0.4659, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.5102639397471214e-06, "epoch": 1.66, "percentage": 83.08, "elapsed_time": "1 day, 0:28:43", "remaining_time": "4:59:07"}
{"current_steps": 1650, "total_steps": 1974, "loss": 0.485, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.3096875060825845e-06, "epoch": 1.67, "percentage": 83.59, "elapsed_time": "1 day, 0:36:59", "remaining_time": "4:50:01"}
{"current_steps": 1660, "total_steps": 1974, "loss": 0.4768, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.11460473638282e-06, "epoch": 1.68, "percentage": 84.09, "elapsed_time": "1 day, 0:45:17", "remaining_time": "4:40:57"}
{"current_steps": 1670, "total_steps": 1974, "loss": 0.4635, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.925065040686642e-06, "epoch": 1.69, "percentage": 84.6, "elapsed_time": "1 day, 0:53:32", "remaining_time": "4:31:52"}
{"current_steps": 1680, "total_steps": 1974, "loss": 0.4681, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.741116425097995e-06, "epoch": 1.7, "percentage": 85.11, "elapsed_time": "1 day, 1:01:47", "remaining_time": "4:22:48"}
{"current_steps": 1690, "total_steps": 1974, "loss": 0.4492, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.5628054796271063e-06, "epoch": 1.71, "percentage": 85.61, "elapsed_time": "1 day, 1:10:02", "remaining_time": "4:13:45"}
{"current_steps": 1700, "total_steps": 1974, "loss": 0.4664, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.390177366390273e-06, "epoch": 1.72, "percentage": 86.12, "elapsed_time": "1 day, 1:18:18", "remaining_time": "4:04:42"}
{"current_steps": 1710, "total_steps": 1974, "loss": 0.48, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.22327580817136e-06, "epoch": 1.73, "percentage": 86.63, "elapsed_time": "1 day, 1:26:34", "remaining_time": "3:55:40"}
{"current_steps": 1720, "total_steps": 1974, "loss": 0.4616, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.0621430773477947e-06, "epoch": 1.74, "percentage": 87.13, "elapsed_time": "1 day, 1:34:47", "remaining_time": "3:46:38"}
{"current_steps": 1730, "total_steps": 1974, "loss": 0.4854, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.906819985183908e-06, "epoch": 1.75, "percentage": 87.64, "elapsed_time": "1 day, 1:43:03", "remaining_time": "3:37:38"}
{"current_steps": 1740, "total_steps": 1974, "loss": 0.4846, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7573458714944063e-06, "epoch": 1.76, "percentage": 88.15, "elapsed_time": "1 day, 1:51:18", "remaining_time": "3:28:37"}
{"current_steps": 1750, "total_steps": 1974, "loss": 0.4552, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.6137585946804674e-06, "epoch": 1.77, "percentage": 88.65, "elapsed_time": "1 day, 1:59:35", "remaining_time": "3:19:37"}
{"current_steps": 1760, "total_steps": 1974, "loss": 0.4615, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4760945221410638e-06, "epoch": 1.78, "percentage": 89.16, "elapsed_time": "1 day, 2:07:51", "remaining_time": "3:10:38"}
{"current_steps": 1770, "total_steps": 1974, "loss": 0.4735, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.3443885210619428e-06, "epoch": 1.79, "percentage": 89.67, "elapsed_time": "1 day, 2:16:10", "remaining_time": "3:01:39"}
{"current_steps": 1780, "total_steps": 1974, "loss": 0.4705, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.2186739495845477e-06, "epoch": 1.8, "percentage": 90.17, "elapsed_time": "1 day, 2:24:26", "remaining_time": "2:52:41"}
{"current_steps": 1790, "total_steps": 1974, "loss": 0.4653, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0989826483571552e-06, "epoch": 1.81, "percentage": 90.68, "elapsed_time": "1 day, 2:32:40", "remaining_time": "2:43:42"}
{"current_steps": 1800, "total_steps": 1974, "loss": 0.4632, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.85344932470364e-07, "epoch": 1.82, "percentage": 91.19, "elapsed_time": "1 day, 2:40:51", "remaining_time": "2:34:44"}
{"current_steps": 1810, "total_steps": 1974, "loss": 0.481, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 8.77789583778979e-07, "epoch": 1.83, "percentage": 91.69, "elapsed_time": "1 day, 2:59:37", "remaining_time": "2:26:45"}
{"current_steps": 1820, "total_steps": 1974, "loss": 0.4773, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.763438436122122e-07, "epoch": 1.84, "percentage": 92.2, "elapsed_time": "1 day, 3:07:54", "remaining_time": "2:17:44"}
{"current_steps": 1830, "total_steps": 1974, "loss": 0.4791, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.810334058740736e-07, "epoch": 1.85, "percentage": 92.71, "elapsed_time": "1 day, 3:16:15", "remaining_time": "2:08:45"}
{"current_steps": 1840, "total_steps": 1974, "loss": 0.4793, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.918824105356797e-07, "epoch": 1.86, "percentage": 93.21, "elapsed_time": "1 day, 3:24:34", "remaining_time": "1:59:46"}
{"current_steps": 1850, "total_steps": 1974, "loss": 0.4647, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.08913437521169e-07, "epoch": 1.87, "percentage": 93.72, "elapsed_time": "1 day, 3:32:49", "remaining_time": "1:50:47"}
{"current_steps": 1860, "total_steps": 1974, "loss": 0.4765, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.3214750098869995e-07, "epoch": 1.88, "percentage": 94.22, "elapsed_time": "1 day, 3:41:06", "remaining_time": "1:41:48"}
{"current_steps": 1870, "total_steps": 1974, "loss": 0.4787, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.616040440080432e-07, "epoch": 1.89, "percentage": 94.73, "elapsed_time": "1 day, 3:49:26", "remaining_time": "1:32:50"}
{"current_steps": 1880, "total_steps": 1974, "loss": 0.4723, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.973009336361021e-07, "epoch": 1.9, "percentage": 95.24, "elapsed_time": "1 day, 3:57:44", "remaining_time": "1:23:53"}
{"current_steps": 1890, "total_steps": 1974, "loss": 0.453, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.392544563915883e-07, "epoch": 1.91, "percentage": 95.74, "elapsed_time": "1 day, 4:06:01", "remaining_time": "1:14:56"}
{"current_steps": 1900, "total_steps": 1974, "loss": 0.4561, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.8747931413001795e-07, "epoch": 1.93, "percentage": 96.25, "elapsed_time": "1 day, 4:14:18", "remaining_time": "1:05:59"}
{"current_steps": 1910, "total_steps": 1974, "loss": 0.4612, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4198862032005488e-07, "epoch": 1.94, "percentage": 96.76, "elapsed_time": "1 day, 4:22:35", "remaining_time": "0:57:03"}
{"current_steps": 1920, "total_steps": 1974, "loss": 0.4774, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0279389672218365e-07, "epoch": 1.95, "percentage": 97.26, "elapsed_time": "1 day, 4:30:53", "remaining_time": "0:48:07"}
{"current_steps": 1930, "total_steps": 1974, "loss": 0.4635, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.990507047049676e-08, "epoch": 1.96, "percentage": 97.77, "elapsed_time": "1 day, 4:39:09", "remaining_time": "0:39:11"}
{"current_steps": 1940, "total_steps": 1974, "loss": 0.4761, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.3330471558378213e-08, "epoch": 1.97, "percentage": 98.28, "elapsed_time": "1 day, 4:47:22", "remaining_time": "0:30:16"}
{"current_steps": 1950, "total_steps": 1974, "loss": 0.4593, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.3076830728713252e-08, "epoch": 1.98, "percentage": 98.78, "elapsed_time": "1 day, 4:55:39", "remaining_time": "0:21:21"}
{"current_steps": 1960, "total_steps": 1974, "loss": 0.4736, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.149277769132658e-09, "epoch": 1.99, "percentage": 99.29, "elapsed_time": "1 day, 5:03:51", "remaining_time": "0:12:27"}
{"current_steps": 1970, "total_steps": 1974, "loss": 0.4486, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.551340212760377e-09, "epoch": 2.0, "percentage": 99.8, "elapsed_time": "1 day, 5:12:08", "remaining_time": "0:03:33"}
{"current_steps": 1974, "total_steps": 1974, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0, "percentage": 100.0, "elapsed_time": "1 day, 5:15:27", "remaining_time": "0:00:00"}
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 2.0,
"global_step": 1974,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.01,
"learning_rate": 4.9996834033646177e-05,
"loss": 0.9175,
"step": 10
},
{
"epoch": 0.02,
"learning_rate": 4.998733693645213e-05,
"loss": 0.7595,
"step": 20
},
{
"epoch": 0.03,
"learning_rate": 4.997151111381707e-05,
"loss": 0.7375,
"step": 30
},
{
"epoch": 0.04,
"learning_rate": 4.9949360574062774e-05,
"loss": 0.7227,
"step": 40
},
{
"epoch": 0.05,
"learning_rate": 4.9920890927418316e-05,
"loss": 0.7147,
"step": 50
},
{
"epoch": 0.06,
"learning_rate": 4.988610938459917e-05,
"loss": 0.7102,
"step": 60
},
{
"epoch": 0.07,
"learning_rate": 4.9845024754980876e-05,
"loss": 0.7056,
"step": 70
},
{
"epoch": 0.08,
"learning_rate": 4.979764744436784e-05,
"loss": 0.7128,
"step": 80
},
{
"epoch": 0.09,
"learning_rate": 4.9743989452357756e-05,
"loss": 0.6982,
"step": 90
},
{
"epoch": 0.1,
"learning_rate": 4.968406436930243e-05,
"loss": 0.7258,
"step": 100
},
{
"epoch": 0.11,
"learning_rate": 4.961788737286559e-05,
"loss": 0.7095,
"step": 110
},
{
"epoch": 0.12,
"learning_rate": 4.954547522417877e-05,
"loss": 0.7048,
"step": 120
},
{
"epoch": 0.13,
"learning_rate": 4.946684626359607e-05,
"loss": 0.6805,
"step": 130
},
{
"epoch": 0.14,
"learning_rate": 4.938202040604898e-05,
"loss": 0.6798,
"step": 140
},
{
"epoch": 0.15,
"learning_rate": 4.929101913600238e-05,
"loss": 0.7134,
"step": 150
},
{
"epoch": 0.16,
"learning_rate": 4.919386550201299e-05,
"loss": 0.6895,
"step": 160
},
{
"epoch": 0.17,
"learning_rate": 4.909058411089174e-05,
"loss": 0.705,
"step": 170
},
{
"epoch": 0.18,
"learning_rate": 4.8981201121471356e-05,
"loss": 0.6712,
"step": 180
},
{
"epoch": 0.19,
"learning_rate": 4.886574423798097e-05,
"loss": 0.6744,
"step": 190
},
{
"epoch": 0.2,
"learning_rate": 4.874424270302927e-05,
"loss": 0.6675,
"step": 200
},
{
"epoch": 0.21,
"learning_rate": 4.861672729019797e-05,
"loss": 0.6726,
"step": 210
},
{
"epoch": 0.22,
"learning_rate": 4.848323029624761e-05,
"loss": 0.6821,
"step": 220
},
{
"epoch": 0.23,
"learning_rate": 4.834378553293748e-05,
"loss": 0.7133,
"step": 230
},
{
"epoch": 0.24,
"learning_rate": 4.81984283184619e-05,
"loss": 0.6681,
"step": 240
},
{
"epoch": 0.25,
"learning_rate": 4.804719546850487e-05,
"loss": 0.682,
"step": 250
},
{
"epoch": 0.26,
"learning_rate": 4.789012528691558e-05,
"loss": 0.6755,
"step": 260
},
{
"epoch": 0.27,
"learning_rate": 4.772725755600682e-05,
"loss": 0.68,
"step": 270
},
{
"epoch": 0.28,
"learning_rate": 4.755863352647909e-05,
"loss": 0.6663,
"step": 280
},
{
"epoch": 0.29,
"learning_rate": 4.738429590697271e-05,
"loss": 0.6672,
"step": 290
},
{
"epoch": 0.3,
"learning_rate": 4.720428885325069e-05,
"loss": 0.6466,
"step": 300
},
{
"epoch": 0.31,
"learning_rate": 4.701865795701505e-05,
"loss": 0.6668,
"step": 310
},
{
"epoch": 0.32,
"learning_rate": 4.684682059461469e-05,
"loss": 0.6591,
"step": 320
},
{
"epoch": 0.33,
"learning_rate": 4.665063509461097e-05,
"loss": 0.661,
"step": 330
},
{
"epoch": 0.34,
"learning_rate": 4.644896598002736e-05,
"loss": 0.6749,
"step": 340
},
{
"epoch": 0.35,
"learning_rate": 4.624186432907437e-05,
"loss": 0.6654,
"step": 350
},
{
"epoch": 0.36,
"learning_rate": 4.602938259590072e-05,
"loss": 0.6716,
"step": 360
},
{
"epoch": 0.37,
"learning_rate": 4.581157459730783e-05,
"loss": 0.6796,
"step": 370
},
{
"epoch": 0.39,
"learning_rate": 4.558849549911931e-05,
"loss": 0.6794,
"step": 380
},
{
"epoch": 0.4,
"learning_rate": 4.536020180220871e-05,
"loss": 0.6651,
"step": 390
},
{
"epoch": 0.41,
"learning_rate": 4.515032675559024e-05,
"loss": 0.6823,
"step": 400
},
{
"epoch": 0.42,
"learning_rate": 4.4912285699446786e-05,
"loss": 0.6677,
"step": 410
},
{
"epoch": 0.43,
"learning_rate": 4.4669201313179155e-05,
"loss": 0.6704,
"step": 420
},
{
"epoch": 0.44,
"learning_rate": 4.442113516454638e-05,
"loss": 0.6481,
"step": 430
},
{
"epoch": 0.45,
"learning_rate": 4.416815008307488e-05,
"loss": 0.665,
"step": 440
},
{
"epoch": 0.46,
"learning_rate": 4.391031014414514e-05,
"loss": 0.6658,
"step": 450
},
{
"epoch": 0.47,
"learning_rate": 4.364768065276284e-05,
"loss": 0.6699,
"step": 460
},
{
"epoch": 0.48,
"learning_rate": 4.338032812701867e-05,
"loss": 0.6664,
"step": 470
},
{
"epoch": 0.49,
"learning_rate": 4.310832028124069e-05,
"loss": 0.6817,
"step": 480
},
{
"epoch": 0.5,
"learning_rate": 4.283172600884393e-05,
"loss": 0.6791,
"step": 490
},
{
"epoch": 0.51,
"learning_rate": 4.2550615364881194e-05,
"loss": 0.6679,
"step": 500
},
{
"epoch": 0.52,
"learning_rate": 4.226505954829973e-05,
"loss": 0.6476,
"step": 510
},
{
"epoch": 0.53,
"learning_rate": 4.197513088390813e-05,
"loss": 0.6721,
"step": 520
},
{
"epoch": 0.54,
"learning_rate": 4.1680902804058095e-05,
"loss": 0.6563,
"step": 530
},
{
"epoch": 0.55,
"learning_rate": 4.138244983004574e-05,
"loss": 0.6391,
"step": 540
},
{
"epoch": 0.56,
"learning_rate": 4.107984755323697e-05,
"loss": 0.6672,
"step": 550
},
{
"epoch": 0.57,
"learning_rate": 4.077317261592194e-05,
"loss": 0.6497,
"step": 560
},
{
"epoch": 0.58,
"learning_rate": 4.04625026919033e-05,
"loss": 0.668,
"step": 570
},
{
"epoch": 0.59,
"learning_rate": 4.0147916466823174e-05,
"loss": 0.6682,
"step": 580
},
{
"epoch": 0.6,
"learning_rate": 3.982949361823388e-05,
"loss": 0.6352,
"step": 590
},
{
"epoch": 0.61,
"learning_rate": 3.950731479541743e-05,
"loss": 0.6698,
"step": 600
},
{
"epoch": 0.62,
"learning_rate": 3.918146159895882e-05,
"loss": 0.6549,
"step": 610
},
{
"epoch": 0.63,
"learning_rate": 3.8852016560078605e-05,
"loss": 0.6516,
"step": 620
},
{
"epoch": 0.64,
"learning_rate": 3.851906311972943e-05,
"loss": 0.6629,
"step": 630
},
{
"epoch": 0.65,
"learning_rate": 3.821647502051616e-05,
"loss": 0.6764,
"step": 640
},
{
"epoch": 0.66,
"learning_rate": 3.787708866250794e-05,
"loss": 0.6415,
"step": 650
},
{
"epoch": 0.67,
"learning_rate": 3.7534440830144466e-05,
"loss": 0.6463,
"step": 660
},
{
"epoch": 0.68,
"learning_rate": 3.71886183083464e-05,
"loss": 0.6508,
"step": 670
},
{
"epoch": 0.69,
"learning_rate": 3.683970868611123e-05,
"loss": 0.6411,
"step": 680
},
{
"epoch": 0.7,
"learning_rate": 3.648780033432891e-05,
"loss": 0.6266,
"step": 690
},
{
"epoch": 0.71,
"learning_rate": 3.613298238339955e-05,
"loss": 0.6409,
"step": 700
},
{
"epoch": 0.72,
"learning_rate": 3.5775344700658705e-05,
"loss": 0.6594,
"step": 710
},
{
"epoch": 0.73,
"learning_rate": 3.5414977867616006e-05,
"loss": 0.6427,
"step": 720
},
{
"epoch": 0.74,
"learning_rate": 3.505197315701292e-05,
"loss": 0.6462,
"step": 730
},
{
"epoch": 0.75,
"learning_rate": 3.468642250970547e-05,
"loss": 0.6277,
"step": 740
},
{
"epoch": 0.76,
"learning_rate": 3.431841851137764e-05,
"loss": 0.6551,
"step": 750
},
{
"epoch": 0.77,
"learning_rate": 3.394805436909157e-05,
"loss": 0.6402,
"step": 760
},
{
"epoch": 0.78,
"learning_rate": 3.357542388768033e-05,
"loss": 0.6515,
"step": 770
},
{
"epoch": 0.79,
"learning_rate": 3.3200621445989226e-05,
"loss": 0.6489,
"step": 780
},
{
"epoch": 0.8,
"learning_rate": 3.282374197297185e-05,
"loss": 0.6568,
"step": 790
},
{
"epoch": 0.81,
"learning_rate": 3.2444880923646674e-05,
"loss": 0.615,
"step": 800
},
{
"epoch": 0.82,
"learning_rate": 3.20641342549205e-05,
"loss": 0.6447,
"step": 810
},
{
"epoch": 0.83,
"learning_rate": 3.168159840128472e-05,
"loss": 0.6159,
"step": 820
},
{
"epoch": 0.84,
"learning_rate": 3.129737025039068e-05,
"loss": 0.6347,
"step": 830
},
{
"epoch": 0.85,
"learning_rate": 3.091154711851022e-05,
"loss": 0.6361,
"step": 840
},
{
"epoch": 0.86,
"learning_rate": 3.052422672588765e-05,
"loss": 0.6504,
"step": 850
},
{
"epoch": 0.87,
"learning_rate": 3.013550717198948e-05,
"loss": 0.6467,
"step": 860
},
{
"epoch": 0.88,
"learning_rate": 2.9745486910657993e-05,
"loss": 0.6364,
"step": 870
},
{
"epoch": 0.89,
"learning_rate": 2.9354264725175185e-05,
"loss": 0.6361,
"step": 880
},
{
"epoch": 0.9,
"learning_rate": 2.8961939703243122e-05,
"loss": 0.6441,
"step": 890
},
{
"epoch": 0.91,
"learning_rate": 2.856861121188735e-05,
"loss": 0.6404,
"step": 900
},
{
"epoch": 0.92,
"learning_rate": 2.8174378872289446e-05,
"loss": 0.6307,
"step": 910
},
{
"epoch": 0.93,
"learning_rate": 2.777934253455522e-05,
"loss": 0.6484,
"step": 920
},
{
"epoch": 0.94,
"learning_rate": 2.7383602252424985e-05,
"loss": 0.6237,
"step": 930
},
{
"epoch": 0.95,
"learning_rate": 2.6987258257932175e-05,
"loss": 0.6161,
"step": 940
},
{
"epoch": 0.96,
"learning_rate": 2.6590410936016895e-05,
"loss": 0.6381,
"step": 950
},
{
"epoch": 0.97,
"learning_rate": 2.619316079910063e-05,
"loss": 0.6366,
"step": 960
},
{
"epoch": 0.98,
"learning_rate": 2.5795608461628802e-05,
"loss": 0.6202,
"step": 970
},
{
"epoch": 0.99,
"learning_rate": 2.5397854614587334e-05,
"loss": 0.6334,
"step": 980
},
{
"epoch": 1.0,
"learning_rate": 2.5e-05,
"loss": 0.5954,
"step": 990
},
{
"epoch": 1.01,
"learning_rate": 2.460214538541267e-05,
"loss": 0.4963,
"step": 1000
},
{
"epoch": 1.02,
"learning_rate": 2.4204391538371207e-05,
"loss": 0.487,
"step": 1010
},
{
"epoch": 1.03,
"learning_rate": 2.3806839200899377e-05,
"loss": 0.489,
"step": 1020
},
{
"epoch": 1.04,
"learning_rate": 2.3409589063983117e-05,
"loss": 0.4805,
"step": 1030
},
{
"epoch": 1.05,
"learning_rate": 2.3012741742067838e-05,
"loss": 0.4907,
"step": 1040
},
{
"epoch": 1.06,
"learning_rate": 2.261639774757503e-05,
"loss": 0.4719,
"step": 1050
},
{
"epoch": 1.07,
"learning_rate": 2.2220657465444782e-05,
"loss": 0.4914,
"step": 1060
},
{
"epoch": 1.08,
"learning_rate": 2.182562112771056e-05,
"loss": 0.4775,
"step": 1070
},
{
"epoch": 1.09,
"learning_rate": 2.143138878811265e-05,
"loss": 0.4935,
"step": 1080
},
{
"epoch": 1.1,
"learning_rate": 2.1038060296756883e-05,
"loss": 0.5082,
"step": 1090
},
{
"epoch": 1.11,
"learning_rate": 2.064573527482482e-05,
"loss": 0.4868,
"step": 1100
},
{
"epoch": 1.12,
"learning_rate": 2.025451308934201e-05,
"loss": 0.4988,
"step": 1110
},
{
"epoch": 1.13,
"learning_rate": 1.9864492828010526e-05,
"loss": 0.4653,
"step": 1120
},
{
"epoch": 1.14,
"learning_rate": 1.9475773274112354e-05,
"loss": 0.4915,
"step": 1130
},
{
"epoch": 1.16,
"learning_rate": 1.9088452881489787e-05,
"loss": 0.4763,
"step": 1140
},
{
"epoch": 1.17,
"learning_rate": 1.8702629749609324e-05,
"loss": 0.4807,
"step": 1150
},
{
"epoch": 1.18,
"learning_rate": 1.8318401598715284e-05,
"loss": 0.4653,
"step": 1160
},
{
"epoch": 1.19,
"learning_rate": 1.793586574507951e-05,
"loss": 0.4778,
"step": 1170
},
{
"epoch": 1.2,
"learning_rate": 1.7555119076353338e-05,
"loss": 0.4839,
"step": 1180
},
{
"epoch": 1.21,
"learning_rate": 1.7176258027028152e-05,
"loss": 0.4718,
"step": 1190
},
{
"epoch": 1.22,
"learning_rate": 1.6799378554010773e-05,
"loss": 0.4793,
"step": 1200
},
{
"epoch": 1.23,
"learning_rate": 1.6424576112319672e-05,
"loss": 0.4825,
"step": 1210
},
{
"epoch": 1.24,
"learning_rate": 1.6051945630908426e-05,
"loss": 0.4857,
"step": 1220
},
{
"epoch": 1.25,
"learning_rate": 1.5681581488622367e-05,
"loss": 0.4802,
"step": 1230
},
{
"epoch": 1.26,
"learning_rate": 1.5313577490294538e-05,
"loss": 0.4812,
"step": 1240
},
{
"epoch": 1.27,
"learning_rate": 1.4948026842987084e-05,
"loss": 0.4682,
"step": 1250
},
{
"epoch": 1.28,
"learning_rate": 1.4585022132384008e-05,
"loss": 0.4974,
"step": 1260
},
{
"epoch": 1.29,
"learning_rate": 1.4224655299341304e-05,
"loss": 0.4737,
"step": 1270
},
{
"epoch": 1.3,
"learning_rate": 1.3867017616600456e-05,
"loss": 0.4877,
"step": 1280
},
{
"epoch": 1.31,
"learning_rate": 1.3512199665671094e-05,
"loss": 0.4753,
"step": 1290
},
{
"epoch": 1.32,
"learning_rate": 1.316029131388878e-05,
"loss": 0.4638,
"step": 1300
},
{
"epoch": 1.33,
"learning_rate": 1.2811381691653607e-05,
"loss": 0.4626,
"step": 1310
},
{
"epoch": 1.34,
"learning_rate": 1.2465559169855535e-05,
"loss": 0.4786,
"step": 1320
},
{
"epoch": 1.35,
"learning_rate": 1.212291133749206e-05,
"loss": 0.4717,
"step": 1330
},
{
"epoch": 1.36,
"learning_rate": 1.178352497948384e-05,
"loss": 0.4803,
"step": 1340
},
{
"epoch": 1.37,
"learning_rate": 1.1447486054694112e-05,
"loss": 0.4803,
"step": 1350
},
{
"epoch": 1.38,
"learning_rate": 1.1114879674157233e-05,
"loss": 0.4739,
"step": 1360
},
{
"epoch": 1.39,
"learning_rate": 1.0785790079522001e-05,
"loss": 0.471,
"step": 1370
},
{
"epoch": 1.4,
"learning_rate": 1.046030062171512e-05,
"loss": 0.4799,
"step": 1380
},
{
"epoch": 1.41,
"learning_rate": 1.0138493739830352e-05,
"loss": 0.4689,
"step": 1390
},
{
"epoch": 1.42,
"learning_rate": 9.820450940248544e-06,
"loss": 0.4599,
"step": 1400
},
{
"epoch": 1.43,
"learning_rate": 9.506252775993882e-06,
"loss": 0.5019,
"step": 1410
},
{
"epoch": 1.44,
"learning_rate": 9.195978826331697e-06,
"loss": 0.4764,
"step": 1420
},
{
"epoch": 1.45,
"learning_rate": 8.889707676612791e-06,
"loss": 0.4579,
"step": 1430
},
{
"epoch": 1.46,
"learning_rate": 8.587516898369589e-06,
"loss": 0.4592,
"step": 1440
},
{
"epoch": 1.47,
"learning_rate": 8.289483029668972e-06,
"loss": 0.4861,
"step": 1450
},
{
"epoch": 1.48,
"learning_rate": 7.99568155572701e-06,
"loss": 0.4753,
"step": 1460
},
{
"epoch": 1.49,
"learning_rate": 7.706186889790209e-06,
"loss": 0.4929,
"step": 1470
},
{
"epoch": 1.5,
"learning_rate": 7.421072354288302e-06,
"loss": 0.4594,
"step": 1480
},
{
"epoch": 1.51,
"learning_rate": 7.140410162263414e-06,
"loss": 0.4912,
"step": 1490
},
{
"epoch": 1.52,
"learning_rate": 6.86427139908008e-06,
"loss": 0.4681,
"step": 1500
},
{
"epoch": 1.53,
"learning_rate": 6.5927260044209655e-06,
"loss": 0.4816,
"step": 1510
},
{
"epoch": 1.54,
"learning_rate": 6.3258427545727e-06,
"loss": 0.4723,
"step": 1520
},
{
"epoch": 1.55,
"learning_rate": 6.063689245006443e-06,
"loss": 0.4856,
"step": 1530
},
{
"epoch": 1.56,
"learning_rate": 5.806331873257462e-06,
"loss": 0.4829,
"step": 1540
},
{
"epoch": 1.57,
"learning_rate": 5.553835822108152e-06,
"loss": 0.4741,
"step": 1550
},
{
"epoch": 1.58,
"learning_rate": 5.306265043078693e-06,
"loss": 0.4654,
"step": 1560
},
{
"epoch": 1.59,
"learning_rate": 5.0636822402296165e-06,
"loss": 0.4668,
"step": 1570
},
{
"epoch": 1.6,
"learning_rate": 4.826148854280277e-06,
"loss": 0.4723,
"step": 1580
},
{
"epoch": 1.61,
"learning_rate": 4.593725047047293e-06,
"loss": 0.4639,
"step": 1590
},
{
"epoch": 1.62,
"learning_rate": 4.3664696862069505e-06,
"loss": 0.4777,
"step": 1600
},
{
"epoch": 1.63,
"learning_rate": 4.144440330385347e-06,
"loss": 0.4546,
"step": 1610
},
{
"epoch": 1.64,
"learning_rate": 3.927693214580075e-06,
"loss": 0.4543,
"step": 1620
},
{
"epoch": 1.65,
"learning_rate": 3.71628323591722e-06,
"loss": 0.4543,
"step": 1630
},
{
"epoch": 1.66,
"learning_rate": 3.5102639397471214e-06,
"loss": 0.4659,
"step": 1640
},
{
"epoch": 1.67,
"learning_rate": 3.3096875060825845e-06,
"loss": 0.485,
"step": 1650
},
{
"epoch": 1.68,
"learning_rate": 3.11460473638282e-06,
"loss": 0.4768,
"step": 1660
},
{
"epoch": 1.69,
"learning_rate": 2.925065040686642e-06,
"loss": 0.4635,
"step": 1670
},
{
"epoch": 1.7,
"learning_rate": 2.741116425097995e-06,
"loss": 0.4681,
"step": 1680
},
{
"epoch": 1.71,
"learning_rate": 2.5628054796271063e-06,
"loss": 0.4492,
"step": 1690
},
{
"epoch": 1.72,
"learning_rate": 2.390177366390273e-06,
"loss": 0.4664,
"step": 1700
},
{
"epoch": 1.73,
"learning_rate": 2.22327580817136e-06,
"loss": 0.48,
"step": 1710
},
{
"epoch": 1.74,
"learning_rate": 2.0621430773477947e-06,
"loss": 0.4616,
"step": 1720
},
{
"epoch": 1.75,
"learning_rate": 1.906819985183908e-06,
"loss": 0.4854,
"step": 1730
},
{
"epoch": 1.76,
"learning_rate": 1.7573458714944063e-06,
"loss": 0.4846,
"step": 1740
},
{
"epoch": 1.77,
"learning_rate": 1.6137585946804674e-06,
"loss": 0.4552,
"step": 1750
},
{
"epoch": 1.78,
"learning_rate": 1.4760945221410638e-06,
"loss": 0.4615,
"step": 1760
},
{
"epoch": 1.79,
"learning_rate": 1.3443885210619428e-06,
"loss": 0.4735,
"step": 1770
},
{
"epoch": 1.8,
"learning_rate": 1.2186739495845477e-06,
"loss": 0.4705,
"step": 1780
},
{
"epoch": 1.81,
"learning_rate": 1.0989826483571552e-06,
"loss": 0.4653,
"step": 1790
},
{
"epoch": 1.82,
"learning_rate": 9.85344932470364e-07,
"loss": 0.4632,
"step": 1800
},
{
"epoch": 1.83,
"learning_rate": 8.77789583778979e-07,
"loss": 0.481,
"step": 1810
},
{
"epoch": 1.84,
"learning_rate": 7.763438436122122e-07,
"loss": 0.4773,
"step": 1820
},
{
"epoch": 1.85,
"learning_rate": 6.810334058740736e-07,
"loss": 0.4791,
"step": 1830
},
{
"epoch": 1.86,
"learning_rate": 5.918824105356797e-07,
"loss": 0.4793,
"step": 1840
},
{
"epoch": 1.87,
"learning_rate": 5.08913437521169e-07,
"loss": 0.4647,
"step": 1850
},
{
"epoch": 1.88,
"learning_rate": 4.3214750098869995e-07,
"loss": 0.4765,
"step": 1860
},
{
"epoch": 1.89,
"learning_rate": 3.616040440080432e-07,
"loss": 0.4787,
"step": 1870
},
{
"epoch": 1.9,
"learning_rate": 2.973009336361021e-07,
"loss": 0.4723,
"step": 1880
},
{
"epoch": 1.91,
"learning_rate": 2.392544563915883e-07,
"loss": 0.453,
"step": 1890
},
{
"epoch": 1.93,
"learning_rate": 1.8747931413001795e-07,
"loss": 0.4561,
"step": 1900
},
{
"epoch": 1.94,
"learning_rate": 1.4198862032005488e-07,
"loss": 0.4612,
"step": 1910
},
{
"epoch": 1.95,
"learning_rate": 1.0279389672218365e-07,
"loss": 0.4774,
"step": 1920
},
{
"epoch": 1.96,
"learning_rate": 6.990507047049676e-08,
"loss": 0.4635,
"step": 1930
},
{
"epoch": 1.97,
"learning_rate": 4.3330471558378213e-08,
"loss": 0.4761,
"step": 1940
},
{
"epoch": 1.98,
"learning_rate": 2.3076830728713252e-08,
"loss": 0.4593,
"step": 1950
},
{
"epoch": 1.99,
"learning_rate": 9.149277769132658e-09,
"loss": 0.4736,
"step": 1960
},
{
"epoch": 2.0,
"learning_rate": 1.551340212760377e-09,
"loss": 0.4486,
"step": 1970
},
{
"epoch": 2.0,
"step": 1974,
"total_flos": 3.437359013145084e+19,
"train_loss": 0.5708327819994277,
"train_runtime": 105327.89,
"train_samples_per_second": 4.797,
"train_steps_per_second": 0.019
}
],
"max_steps": 1974,
"num_train_epochs": 2,
"total_flos": 3.437359013145084e+19,
"trial_name": null,
"trial_params": null
}
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
from peft import PeftModel, PeftConfig
model_path = "/home/wanglch/projects/DISC-FinLLM/FinLLM"
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
)
model.generation_config = GenerationConfig.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(
model_path, use_fast=False, trust_remote_code=True,
)
messages = [
{"role": "user", "content": "请解释一下什么是银行不良资产?"},
]
response = model.chat(tokenizer, messages)
print(response)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment