Initial commit

0bc22e1d · wanglch · 0bc22e1d · 0bc22e1d · 0bc22e1d · 0bc22e1d
Commit 0bc22e1d authored May 22, 2024 by wanglch
20 changed files
--- a/README.md
+++ b/README.md
+# Vary:
+**开源多模态OCR大模型**
+
+## 论文
+
+- [论文地址][Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models](https://arxiv.org/abs/2312.06109)
+
+- [Vary weights huggingface 预训练模型下载地址]<https://huggingface.co/Haoran-megvii/Vary> 可联系作者获取模型权重！
+`weihaoran18@mails.ucas.ac.cn`
+
+- 本项目提供权重地址为[Here](https://pan.baidu.com/s/1CjlRmq0_q-NSJez2BKrghg),
+  验证码可在本仓库留言索取。
+
+- [Download the CLIP-VIT-L]<https://huggingface.co/openai/clip-vit-large-patch14/>
+
+## 模型结构
+
+### Vary模型结构
+<div align="center">
+<img align="center" src=image/model.png>
+</div>
+
+
+## 算法原理
+
+ Vary享有两种构象：Vary-tiny 和 Vary-base。我们设计 Vary-tiny 来 “编写”新的视觉词汇，而 Vary-base 则利用新的词汇。具体来说，Vary-tiny 主要由词汇网络和微型 OPT-125M组成。在这两个模块之间，我们添加了一个线性层来对齐通道尺寸。由于 Vary-tiny 主要关注细粒度感知，因此它没有文本输入分支。我们希望新的视觉词汇网络能在处理人工图像（即文档和图表）方面表现出色，以弥补 CLIP 的不足。同时，我们也希望在对自然图像进行标记时，它不会成为 CLIP 的噪音。因此，在生成过程中，我们将人工文档和图表数据作为正样本，将自然图像作为负样本来训练 Vary-tiny。完成上述过程后，我们提取词汇网络并将其添加到一个大型模型中，从而建立 Vary-base。新旧词汇网络享有独立的输入嵌入层，并在 LLM 之前进行整合。在这一阶段，我们冻结新旧视觉词汇网络的权重，并解冻其他模块的权重。
+
+
+## 环境配置
+### Docker（方法一）
+
+注：在部署环境前需将Vary/vary/demo/run_qwen_vary.py和Vary/vary/model/vary_qwen_vary.py中的模型路径改为本地模型路径，同时将模型中的config.json文件中的模型路径改为本地路径，完成以上操作后再执行pip install e .指令。
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk23.10.1-py310
+
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name vary <your imageID> bash
+
+docker exec -it vary bash
+
+cd /path/your_code_data/Vary
+
+pip install e .
+
+pip install ninja
+```
+### Dockerfile（方法二）
+```
+cd /path/your_code_data/Vary/docker
+
+docker build --no-cache -t vary:latest .
+
+docker run --shm-size=64G --name vary -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it vary bash
+```
+### Anaconda（方法三）
+
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+
+```
+DTK驱动：dtk23.10
+python：python3.10
+torch:2.1
+torchvision: 0.16.0
+deepspped: 0.12.3
+```
+```
+conda create -n vary python=3.10
+
+conda activate vary
+
+cd /path/your_code_data/Vary
+
+pip install e .
+
+pip install ninja
+```
+
+
+`Tips：以上dtk驱动、python、torch、deepspeed等DCU相关工具版本需要严格一一对应`
+
+## 数据集
+
+无, 暂未开放数据集
+
+
+## 训练
+
+需自己构建数据集
+
+1. For Vary-base
+```Shell
+deepspeed   Vary/train/train_qwen_vary.py  --deepspeed /Vary/zero_config/zero2.json
+            --model_name_or_path /Qwen-7B/path/
+            --vision_tower /vit-large-patch14/path/
+            --freeze_vision_tower True
+            --freeze_lm_model False
+            --vision_select_layer  -2
+            --use_im_start_end True
+            --bf16 True
+            --per_device_eval_batch_size 4
+            --gradient_accumulation_steps 1
+            --evaluation_strategy "no"
+            --save_strategy "steps"
+            --save_steps 5000
+            --save_total_limit 1
+            --weight_decay 0.
+            --warmup_ratio 0.03
+            --lr_scheduler_type "cosine"
+            --logging_steps 1 --tf32 True
+            --model_max_length 4096
+            --gradient_checkpointing True
+            --dataloader_num_workers 4
+            --report_to none
+            --per_device_train_batch_size 4
+            --num_train_epochs 1
+            --learning_rate 5e-5
+            --datasets  data_name1+data_name2+data_name3
+            --output_dir /path/to/output/
+```
+2. For Vary-tiny
+```Shell
+deepspeed   Vary/train/train_opt.py  --deepspeed /Vary/zero_config/zero2.json
+            --model_name_or_path /opt125m/path/
+            --conversation_version opt
+            --freeze_vision_tower False
+            --freeze_lm_model False
+            --use_im_start_end True
+            --bf16 True
+            --per_device_eval_batch_size 4
+            --gradient_accumulation_steps 1
+            --evaluation_strategy "no"
+            --save_strategy "steps"
+            --save_steps 5000
+            --save_total_limit 1
+            --weight_decay 0.
+            --warmup_ratio 0.03
+            --lr_scheduler_type "cosine"
+            --logging_steps 1 --tf32 True
+            --model_max_length 4096
+            --gradient_checkpointing True
+            --dataloader_num_workers 4
+            --report_to none
+            --per_device_train_batch_size 16
+            --num_train_epochs 1
+            --learning_rate 5e-5
+            --datasets  data_name1+data_name2+data_name3
+            --output_dir /path/to/output/
+```
+
+## 推理
+
+**需严格按照本仓库代码目录进行排列**
+
+备注：在run.sh修改 --image-file 替换ocr文件
+
+```
+python /home/wanglch/projects/Vary/vary/demo/run_qwen_vary.py --model-name /home/wanglch/projects/Vary/cache/models--HaoranWei--vary-llava80k --image-file /home/wanglch/projects/Vary/image/pic.jpg
+```
+
+备注：修改 vary/demo/run_qwen_vary.py 替换57行代码执行不同任务操作
+
+```
+qs = 'Provide the ocr results of this image.' # 执行ocr任务
+qs = 'Detevate the ** in this image.' # 检测任务
+qs = 'Convert the document to markdown format.' # 公式转markdown
+qs = 'Describe this image in within 100 words.' # 多模态描述
+```
+### 推理代码
+```
+bash run.sh
+```
+
+## result
+
+### 英语文档
+<div align="center">
+<img align="center" src=image/pic3.jpg>
+</div>
+
+### 英语文档ocr结果
+<div align="center">
+<img align="center" src=assets/ocr_en.png>
+</div>
+
+### 中文文档
+<div align="center">
+<img align="center" src=image/pic2.jpg>
+</div>
+
+### 中文文档ocr结果
+<div align="center">
+<img align="center" src=assets/ocr_cn.png>
+</div>
+
+### 车牌识别
+
+<div align="center">
+<img align="center" src=image/car.png>
+</div>
+
+### 车牌识别结果
+<div align="center">
+<img align="center" src=assets/car_number.png>
+</div>
+
+### 内容识别
+<div align="center">
+<img align="center" src=image/pic.jpg>
+</div>
+
+### 内容识别结果
+<div align="center">
+<img align="center" src=assets/pic_result.png>
+</div>
+
+
+
+### 精度
+无
+
+## 应用场景
+
+`金融,教育,政府,科研,交通,广媒`
+
+### 算法类别
+
+`图文OCR`
+
+## 预训练权重
+
+- [Vary weights huggingface 预训练模型下载地址]<https://huggingface.co/Haoran-megvii/Vary> 可联系作者获取模型权重！
+`weihaoran18@mails.ucas.ac.cn`
+
+- 本项目提供权重地址为[Here](https://pan.baidu.com/s/1CjlRmq0_q-NSJez2BKrghg),
+  验证码可在本仓库留言索取。
+
+- [Download the CLIP-VIT-L]<https://huggingface.co/openai/clip-vit-large-patch14/>
+
+   
+
+## 参考资料
+- 本项目gitlab地址[Ucas-HaoranWei/Vary](https://github.com/Ucas-HaoranWei/Vary)
+
--- a/README_vary.md
+++ b/README_vary.md
+<h3><a href="">Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models</a></h3>
+<a href="https://varybase.github.io/"><img src="https://img.shields.io/badge/Project-Page-Green"></a>
+<a href="https://arxiv.org/abs/2312.06109"><img src="https://img.shields.io/badge/Paper-PDF-orange"></a> 
+<a href="http://region-31.seetacloud.com:22701/"><img src="https://img.shields.io/badge/demo-blue"></a> 
+<a href="https://zhuanlan.zhihu.com/p/671420712"><img src="https://img.shields.io/badge/zhihu-yellow"></a> 
+
+[Haoran Wei*](https://scholar.google.com/citations?user=J4naK0MAAAAJ&hl=en), Lingyu Kong*, Jinyue Chen, Liang Zhao, [Zheng Ge](https://joker316701882.github.io/), [Jinrong Yang](https://yancie-yjr.github.io/), [Jianjian Sun](https://scholar.google.com/citations?user=MVZrGkYAAAAJ&hl=en), Chunrui Han, [Xiangyu Zhang](https://scholar.google.com/citations?user=yuB-cfoAAAAJ&hl=en)
+	
+
+
+<p align="center">
+<img src="assets/logo.jpg" style="width: 200px" align=center>
+</p>
+
+## Release
+- [2024/4/21] 🔥🔥🔥 For OneChart, we have released the web demo in [Project Page](https://onechartt.github.io/). Have fun!!
+- [2024/4/21] 🔥🔥🔥 We present a Vary-tiny LAVIS codebase (for training from scratch) and the Vary-600k dataset (300K English and 300K Chinese pages) [here](https://github.com/Ucas-HaoranWei/Vary-tiny-600k) !!!
+- [2024/4/15]🔥🔥🔥We release a chart parsing model OneChart [here](https://github.com/LingyvKong/OneChart).
+- [2024/4/12]🔥🔥🔥We will release a chart parsing model based on Vary-tiny next week. The model supports both English and Chinese charts.
+- [2024/3/16]🔥🔥🔥I found many friends very interested in Vary-tiny(OPT-125M), so I opened source it [here](https://huggingface.co/HaoranWei/Vary-tiny-opt125M/tree/main), a PDF-dense OCR and object detection version.
+- [2023/1/23]🔥🔥🔥We release the Vary-toy [here](https://github.com/Ucas-HaoranWei/Vary-toy). Besides, we show the super good Vary-family results [here](https://github.com/Ucas-HaoranWei/Vary-family).
+- [2023/12/29]🔥🔥🔥We will release a new model (a small-size Vary, about 2B) at the beginning of next month and introduce a new feature (object detection). Our online demo will be temporarily closed to prepare for the deployment of the new model.
+- [2023/12/11] We released the online demo, have fun! 
+- [2023/12/11] We released the codes of Vary (train and inference)! 
+
+[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
+[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
+**Usage and License Notices**: The data, code, and checkpoint are intended and licensed for research use only. They are also restricted to use that follow the license agreement of LLaMA, Vicuna, GPT-4, Qwen, and LLaVA. 
+
+
+## Contents
+- [Install](#install)
+- [Vary Weights](#vary-weights)
+- [Demo](#Demo)
+- [Train](#train)
+
+## Install
+1. Clone this repository and navigate to the Vary folder
+```bash
+git clone https://github.com/Ucas-HaoranWei/Vary.git
+cd Vary
+```
+2. Install Package
+```Shell
+conda create -n vary python=3.10 -y
+conda activate vary
+pip install e .
+```
+
+3. Install Flash-Attention
+```
+pip install ninja
+pip install flash-attn --no-build-isolation
+```
+
+## Vary Weights
+- If you are in urgent need of weights for your research recently, please contact me by email. 
+- Download the CLIP-VIT-L in [Hugging Face](https://huggingface.co/openai/clip-vit-large-patch14/tree/main)
+  
+## Demo
+1. Update the CLIP-VIT path in the codes (/cache/vit-large-patch14/) to your path.
+
+2.
+```Shell
+python vary/demo/run_qwen_vary.py  --model-name  /vary/model/path/ --image-file /an/image/file.png
+```
+## Train
+- We currently do not plan to open source the weights of the intermediate.
+- However, we release the train codes. So you can train on your own dataset.
+If you want to do this, you can try this:
+1. For Vary-base (one machine, if you have multiple machines you need to prepare your host file)
+```Shell
+deepspeed   Vary/train/train_qwen_vary.py  --deepspeed /Vary/zero_config/zero2.json
+            --model_name_or_path /Qwen-7B/path/
+            --vision_tower /vit-large-patch14/path/
+            --freeze_vision_tower True
+            --freeze_lm_model False
+            --vision_select_layer  -2
+            --use_im_start_end True
+            --bf16 True
+            --per_device_eval_batch_size 4
+            --gradient_accumulation_steps 1
+            --evaluation_strategy "no"
+            --save_strategy "steps"
+            --save_steps 5000
+            --save_total_limit 1
+            --weight_decay 0.
+            --warmup_ratio 0.03
+            --lr_scheduler_type "cosine"
+            --logging_steps 1 --tf32 True
+            --model_max_length 4096
+            --gradient_checkpointing True
+            --dataloader_num_workers 4
+            --report_to none
+            --per_device_train_batch_size 4
+            --num_train_epochs 1
+            --learning_rate 5e-5
+            --datasets  data_name1+data_name2+data_name3
+            --output_dir /path/to/output/
+```
+2. For Vary-tiny
+```Shell
+deepspeed   Vary/train/train_opt.py  --deepspeed /Vary/zero_config/zero2.json
+            --model_name_or_path /opt125m/path/
+            --conversation_version opt
+            --freeze_vision_tower False
+            --freeze_lm_model False
+            --use_im_start_end True
+            --bf16 True
+            --per_device_eval_batch_size 4
+            --gradient_accumulation_steps 1
+            --evaluation_strategy "no"
+            --save_strategy "steps"
+            --save_steps 5000
+            --save_total_limit 1
+            --weight_decay 0.
+            --warmup_ratio 0.03
+            --lr_scheduler_type "cosine"
+            --logging_steps 1 --tf32 True
+            --model_max_length 4096
+            --gradient_checkpointing True
+            --dataloader_num_workers 4
+            --report_to none
+            --per_device_train_batch_size 16
+            --num_train_epochs 1
+            --learning_rate 5e-5
+            --datasets  data_name1+data_name2+data_name3
+            --output_dir /path/to/output/
+```
+
+
+## Contact
+If you have any questions related to the code or the paper, feel free to email (`weihaoran18@mails.ucas.ac.cn`).
+
+## Acknowledgement
+- [LLaVA](https://github.com/lm-sys/FastChat): the codebase we built upon!
+- [Qwen](https://github.com/QwenLM/Qwen): the LLM base model of Vary, which is good at both English and Chinese!
+
+
+
+
+## Citation
+If you find our work useful in your research, please consider citing Vary:
+```bibtex
+@article{wei2023vary,
+  title={Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models},
+  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yang, Jinrong and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
+  journal={arXiv preprint arXiv:2312.06109},
+  year={2023}
+}
+
+@article{wei2024small,
+  title={Small Language Model Meets with Reinforced Vision Vocabulary},
+  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yu, En and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
+  journal={arXiv preprint arXiv:2401.12503},
+  year={2024}
+}
+```
--- a/Vary_paper.pdf
+++ b/Vary_paper.pdf
--- a/assets/car_number.png
+++ b/assets/car_number.png
--- a/assets/logo.jpg
+++ b/assets/logo.jpg
--- a/assets/ocr_cn.png
+++ b/assets/ocr_cn.png
--- a/assets/ocr_en.png
+++ b/assets/ocr_en.png
--- a/assets/pic.jpg
+++ b/assets/pic.jpg
--- a/assets/pic_result.png
+++ b/assets/pic_result.png
--- a/cache/models--HaoranWei--vary-llava80k/config.json
+++ b/cache/models--HaoranWei--vary-llava80k/config.json
+{
+  "_name_or_path": "/Vary/cache/models--HaoranWei--vary-llava80k/",
+  "architectures": [
+    "MMGPTQwenForCausalLM"
+  ],
+  "attn_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_qwen.QWenConfig",
+    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
+  },
+  "bf16": false,
+  "emb_dropout_prob": 0.0,
+  "fp16": false,
+  "fp32": false,
+  "freeze_vision_tower": false,
+  "hidden_size": 4096,
+  "im_end_token": 151858,
+  "im_patch_token": 151859,
+  "im_start_token": 151857,
+  "image_token_len": 256,
+  "initializer_range": 0.02,
+  "intermediate_size": 22016,
+  "kv_channels": 128,
+  "layer_norm_epsilon": 1e-06,
+  "max_position_embeddings": 8192,
+  "model_type": "mmgpt",
+  "no_bias": true,
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "onnx_safe": null,
+  "rotary_emb_base": 10000,
+  "rotary_pct": 1.0,
+  "scale_attn_weights": true,
+  "seq_length": 2048,
+  "tie_word_embeddings": false,
+  "tokenizer_type": "QWenTokenizer",
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.32.1",
+  "use_cache": true,
+  "use_dynamic_ntk": true,
+  "use_flash_attn": false,
+  "use_im_start_end": true,
+  "use_logn_attn": true,
+  "vision_select_layer": -2,
+  "vision_tower": "/mnt/host0/vit-large-patch14",
+  "visual": {
+    "heads": 16,
+    "image_size": 448,
+    "image_start_id": 151857,
+    "layers": 48,
+    "mlp_ratio": 4.9231,
+    "output_dim": 4096,
+    "patch_size": 14,
+    "width": 1664
+  },
+  "vocab_size": 151860
+}
--- a/cache/models--HaoranWei--vary-llava80k/generation_config.json
+++ b/cache/models--HaoranWei--vary-llava80k/generation_config.json
+{
+  "chat_format": "chatml",
+  "do_sample": true,
+  "eos_token_id": 151643,
+  "max_new_tokens": 512,
+  "max_window_size": 6144,
+  "pad_token_id": 151643,
+  "top_k": 0,
+  "top_p": 0.4,
+  "transformers_version": "4.32.1"
+}
--- a/cache/models--HaoranWei--vary-llava80k/pytorch_model.bin.index.json
+++ b/cache/models--HaoranWei--vary-llava80k/pytorch_model.bin.index.json
+{
+  "metadata": {
+    "total_size": 16247298560
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.21.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.22.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.24.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.25.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.26.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.27.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.28.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.29.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.30.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.30.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.attn.c_attn.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.attn.c_attn.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.ln_1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.mlp.w1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.31.mlp.w2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.w1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.w2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.ln_f.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.mm_projector.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.mm_projector.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.mm_projector_vary.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.mm_projector_vary.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.embeddings.class_embedding": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.embeddings.patch_embedding.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.embeddings.position_embedding.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.post_layernorm.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.post_layernorm.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.pre_layrnorm.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower.vision_model.pre_layrnorm.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.0.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.1.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.10.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.11.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.2.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.3.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.4.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.5.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.6.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.7.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.8.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.attn.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.attn.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.attn.qkv.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.attn.qkv.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.attn.rel_pos_h": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.attn.rel_pos_w": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.mlp.lin1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.mlp.lin1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.mlp.lin2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.mlp.lin2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.norm1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.norm1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.norm2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.blocks.9.norm2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.neck.0.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.neck.1.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.neck.1.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.neck.2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.neck.3.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.neck.3.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.net_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.net_3.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.patch_embed.proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.patch_embed.proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.vision_tower_high.pos_embed": "pytorch_model-00002-of-00002.bin",
+    "transformer.wte.weight": "pytorch_model-00001-of-00002.bin"
+  }
+}
--- a/cache/models--HaoranWei--vary-llava80k/special_tokens_map.json
+++ b/cache/models--HaoranWei--vary-llava80k/special_tokens_map.json
+{
+  "pad_token": "<|endoftext|>"
+}
--- a/cache/models--HaoranWei--vary-llava80k/tokenization_qwen.py
+++ b/cache/models--HaoranWei--vary-llava80k/tokenization_qwen.py
+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Tokenization classes for QWen."""
+
+import base64
+import logging
+import os
+import unicodedata
+from typing import Collection, Dict, List, Set, Tuple, Union
+
+import tiktoken
+from transformers import PreTrainedTokenizer, AddedToken
+
+logger = logging.getLogger(__name__)
+
+
+VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
+
+PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
+ENDOFTEXT = "<|endoftext|>"
+IMSTART = "<|im_start|>"
+IMEND = "<|im_end|>"
+# as the default behavior is changed to allow special tokens in
+# regular texts, the surface forms of special tokens need to be
+# as different as possible to minimize the impact
+EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
+SPECIAL_TOKENS = (
+    ENDOFTEXT,
+    IMSTART,
+    IMEND,
+) + EXTRAS
+
+
+def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
+    with open(tiktoken_bpe_file, "rb") as f:
+        contents = f.read()
+    return {
+        base64.b64decode(token): int(rank)
+        for token, rank in (line.split() for line in contents.splitlines() if line)
+    }
+
+class QWenTokenizer(PreTrainedTokenizer):
+    """QWen tokenizer."""
+
+    vocab_files_names = VOCAB_FILES_NAMES
+
+    def __init__(
+        self,
+        vocab_file,
+        errors="replace",
+        image_start_tag='<img>',
+        image_end_tag='</img>',
+        image_pad_tag='<imgpad>',
+        ref_start_tag='<ref>',
+        ref_end_tag='</ref>',
+        box_start_tag='<box>',
+        box_end_tag='</box>',
+        quad_start_tag='<quad>',
+        quad_end_tag='</quad>',
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        
+        self.image_start_tag = image_start_tag
+        self.image_end_tag = image_end_tag
+        self.image_pad_tag = image_pad_tag
+        self.ref_start_tag = ref_start_tag
+        self.ref_end_tag = ref_end_tag
+        self.box_start_tag = box_start_tag
+        self.box_end_tag = box_end_tag
+        self.quad_start_tag = quad_start_tag
+        self.quad_end_tag = quad_end_tag
+        self.IMAGE_ST = (
+            ref_start_tag, ref_end_tag,
+            box_start_tag, box_end_tag,
+            quad_start_tag, quad_end_tag,
+            image_start_tag, image_end_tag,
+            image_pad_tag
+        )
+
+        self.errors = errors  # how to handle errors in decoding
+
+        self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)  # type: dict[bytes, int]
+        self.special_tokens = {
+            token: index
+            for index, token in enumerate(
+                SPECIAL_TOKENS + self.IMAGE_ST, start=len(self.mergeable_ranks)
+            )
+        }
+        
+        self.img_start_id = self.special_tokens[self.image_start_tag]
+        self.img_end_id = self.special_tokens[self.image_end_tag]
+        self.img_pad_id = self.special_tokens[self.image_pad_tag]
+        self.ref_start_id = self.special_tokens[self.ref_start_tag]
+        self.ref_end_id = self.special_tokens[self.ref_end_tag]
+        self.box_start_id = self.special_tokens[self.box_start_tag]
+        self.box_end_id = self.special_tokens[self.box_end_tag]
+        self.quad_start_id = self.special_tokens[self.quad_start_tag]
+        self.quad_end_id = self.special_tokens[self.quad_end_tag]
+
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        assert (
+            len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
+        ), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
+
+        self.decoder = {
+            v: k for k, v in self.mergeable_ranks.items()
+        }  # type: dict[int, bytes|str]
+        self.decoder.update({v: k for k, v in self.special_tokens.items()})
+
+        self.tokenizer = enc  # type: tiktoken.Encoding
+
+        self.eod_id = self.tokenizer.eot_token
+        self.im_start_id = self.special_tokens[IMSTART]
+        self.im_end_id = self.special_tokens[IMEND]
+
+    def __len__(self) -> int:
+        return self.tokenizer.n_vocab
+
+    def get_vocab(self) -> Dict[bytes, int]:
+        return self.mergeable_ranks
+
+    def convert_tokens_to_ids(
+        self, tokens: Union[bytes, str, List[Union[bytes, str]]]
+    ) -> List[int]:
+        ids = []
+        if isinstance(tokens, (str, bytes)):
+            if tokens in self.special_tokens:
+                return self.special_tokens[tokens]
+            else:
+                return self.mergeable_ranks.get(tokens)
+        for token in tokens:
+            if token in self.special_tokens:
+                ids.append(self.special_tokens[token])
+            else:
+                ids.append(self.mergeable_ranks.get(token))
+        return ids
+
+    def _add_tokens(self, new_tokens: Union[List[str], List[AddedToken]], special_tokens: bool = False) -> int:
+        if not special_tokens and new_tokens:
+            raise ValueError('Adding regular tokens is not supported')
+        for token in new_tokens:
+            surface_form = token.content if isinstance(token, AddedToken) else token
+            if surface_form not in SPECIAL_TOKENS:
+                raise ValueError('Adding unknown special tokens is not supported')
+        return 0
+
+    def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
+        """
+        Save only the vocabulary of the tokenizer (vocabulary).
+
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        file_path = os.path.join(save_directory, "qwen.tiktoken")
+        with open(file_path, "w", encoding="utf8") as w:
+            for k, v in self.mergeable_ranks.items():
+                line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
+                w.write(line)
+        return (file_path,)
+
+    def tokenize(
+        self,
+        text: str,
+        allowed_special: Union[Set, str] = "all",
+        disallowed_special: Union[Collection, str] = (),
+        **kwargs,
+    ) -> List[Union[bytes, str]]:
+        """
+        Converts a string in a sequence of tokens.
+
+        Args:
+            text (`str`):
+                The sequence to be encoded.
+            allowed_special (`Literal["all"]` or `set`):
+                The surface forms of the tokens to be encoded as special tokens in regular texts.
+                Default to "all".
+            disallowed_special (`Literal["all"]` or `Collection`):
+                The surface forms of the tokens that should not be in regular texts and trigger errors.
+                Default to an empty tuple.
+
+            kwargs (additional keyword arguments, *optional*):
+                Will be passed to the underlying model specific encode method.
+
+        Returns:
+            `List[bytes|str]`: The list of tokens.
+        """
+        tokens = []
+        text = unicodedata.normalize("NFC", text)
+
+        # this implementation takes a detour: text -> token id -> token surface forms
+        for t in self.tokenizer.encode(
+            text, allowed_special=allowed_special, disallowed_special=disallowed_special
+        ):
+            tokens.append(self.decoder[t])
+        return tokens
+
+    def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
+        """
+        Converts a sequence of tokens in a single string.
+        """
+        text = ""
+        temp = b""
+        for t in tokens:
+            if isinstance(t, str):
+                if temp:
+                    text += temp.decode("utf-8", errors=self.errors)
+                    temp = b""
+                text += t
+            elif isinstance(t, bytes):
+                temp += t
+            else:
+                raise TypeError("token should only be of type types or str")
+        if temp:
+            text += temp.decode("utf-8", errors=self.errors)
+        return text
+
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_vocab
+
+    def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
+        """Converts an id to a token, special tokens included"""
+        if index in self.decoder:
+            return self.decoder[index]
+        raise ValueError("unknown ids")
+
+    def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
+        """Converts a token to an id using the vocab, special tokens included"""
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        if token in self.mergeable_ranks:
+            return self.mergeable_ranks[token]
+        raise ValueError("unknown token")
+
+    def _tokenize(self, text: str, **kwargs):
+        """
+        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
+        vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
+
+        Do NOT take care of added tokens.
+        """
+        raise NotImplementedError
+
+    def _decode(
+        self,
+        token_ids: Union[int, List[int]],
+        skip_special_tokens: bool = False,
+        errors: str = None,
+        **kwargs,
+    ) -> str:
+        if isinstance(token_ids, int):
+            token_ids = [token_ids]
+        if skip_special_tokens:
+            token_ids = [i for i in token_ids if i < self.eod_id]
+        return self.tokenizer.decode(token_ids, errors=errors or self.errors)
--- a/cache/models--HaoranWei--vary-llava80k/tokenizer_config.json
+++ b/cache/models--HaoranWei--vary-llava80k/tokenizer_config.json
+{
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_qwen.QWenTokenizer",
+      null
+    ]
+  },
+  "clean_up_tokenization_spaces": true,
+  "model_max_length": 4096,
+  "padding_side": "right",
+  "tokenizer_class": "QWenTokenizer"
+}
--- a/cache/models--HaoranWei--vary-llava80k/trainer_state.json
+++ b/cache/models--HaoranWei--vary-llava80k/trainer_state.json
+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 873,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "learning_rate": 3.7037037037037036e-07,
+      "loss": 1.6596,
+      "step": 1
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 7.407407407407407e-07,
+      "loss": 1.7984,
+      "step": 2
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1.111111111111111e-06,
+      "loss": 1.7185,
+      "step": 3
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1.4814814814814815e-06,
+      "loss": 1.7882,
+      "step": 4
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.8518518518518519e-06,
+      "loss": 1.7184,
+      "step": 5
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 2.222222222222222e-06,
+      "loss": 1.7167,
+      "step": 6
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 2.5925925925925925e-06,
+      "loss": 1.6061,
+      "step": 7
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 2.962962962962963e-06,
+      "loss": 1.7068,
+      "step": 8
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 3.3333333333333333e-06,
+      "loss": 1.4754,
+      "step": 9
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 3.7037037037037037e-06,
+      "loss": 1.6545,
+      "step": 10
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.074074074074074e-06,
+      "loss": 1.6856,
+      "step": 11
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.444444444444444e-06,
+      "loss": 1.6343,
+      "step": 12
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.814814814814815e-06,
+      "loss": 1.5836,
+      "step": 13
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 5.185185185185185e-06,
+      "loss": 1.6395,
+      "step": 14
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 5.555555555555557e-06,
+      "loss": 1.7376,
+      "step": 15
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 5.925925925925926e-06,
+      "loss": 1.4839,
+      "step": 16
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 6.296296296296297e-06,
+      "loss": 1.7036,
+      "step": 17
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 6.666666666666667e-06,
+      "loss": 1.7867,
+      "step": 18
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 7.0370370370370375e-06,
+      "loss": 1.6244,
+      "step": 19
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 7.4074074074074075e-06,
+      "loss": 1.6133,
+      "step": 20
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 7.77777777777778e-06,
+      "loss": 1.5338,
+      "step": 21
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 8.148148148148148e-06,
+      "loss": 1.4873,
+      "step": 22
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 8.518518518518519e-06,
+      "loss": 1.566,
+      "step": 23
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 8.888888888888888e-06,
+      "loss": 1.5234,
+      "step": 24
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 9.25925925925926e-06,
+      "loss": 1.5517,
+      "step": 25
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 9.62962962962963e-06,
+      "loss": 1.6403,
+      "step": 26
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1e-05,
+      "loss": 1.5553,
+      "step": 27
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 9.99996552545612e-06,
+      "loss": 1.4303,
+      "step": 28
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 9.999862102299874e-06,
+      "loss": 1.5491,
+      "step": 29
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 9.99968973195745e-06,
+      "loss": 1.5947,
+      "step": 30
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.999448416805802e-06,
+      "loss": 1.6149,
+      "step": 31
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.999138160172624e-06,
+      "loss": 1.6031,
+      "step": 32
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.998758966336296e-06,
+      "loss": 1.4953,
+      "step": 33
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.998310840525835e-06,
+      "loss": 1.3904,
+      "step": 34
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.99779378892081e-06,
+      "loss": 1.5396,
+      "step": 35
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.997207818651273e-06,
+      "loss": 1.5207,
+      "step": 36
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.996552937797646e-06,
+      "loss": 1.4792,
+      "step": 37
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.995829155390613e-06,
+      "loss": 1.549,
+      "step": 38
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 9.995036481411005e-06,
+      "loss": 1.5933,
+      "step": 39
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.994174926789648e-06,
+      "loss": 1.5199,
+      "step": 40
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.993244503407227e-06,
+      "loss": 1.4438,
+      "step": 41
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.99224522409411e-06,
+      "loss": 1.5248,
+      "step": 42
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.991177102630173e-06,
+      "loss": 1.4997,
+      "step": 43
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.99004015374462e-06,
+      "loss": 1.4953,
+      "step": 44
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.988834393115768e-06,
+      "loss": 1.5326,
+      "step": 45
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.987559837370832e-06,
+      "loss": 1.4969,
+      "step": 46
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.986216504085709e-06,
+      "loss": 1.4415,
+      "step": 47
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 9.984804411784717e-06,
+      "loss": 1.5773,
+      "step": 48
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.983323579940351e-06,
+      "loss": 1.4887,
+      "step": 49
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.981774028973013e-06,
+      "loss": 1.4998,
+      "step": 50
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.980155780250728e-06,
+      "loss": 1.5108,
+      "step": 51
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.97846885608885e-06,
+      "loss": 1.5011,
+      "step": 52
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.976713279749754e-06,
+      "loss": 1.4729,
+      "step": 53
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.97488907544252e-06,
+      "loss": 1.4965,
+      "step": 54
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.972996268322594e-06,
+      "loss": 1.5399,
+      "step": 55
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 9.971034884491436e-06,
+      "loss": 1.4602,
+      "step": 56
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.969004950996175e-06,
+      "loss": 1.4677,
+      "step": 57
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.96690649582922e-06,
+      "loss": 1.4103,
+      "step": 58
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.964739547927892e-06,
+      "loss": 1.4356,
+      "step": 59
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.962504137173997e-06,
+      "loss": 1.502,
+      "step": 60
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.96020029439345e-06,
+      "loss": 1.5595,
+      "step": 61
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.957828051355817e-06,
+      "loss": 1.4218,
+      "step": 62
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.955387440773902e-06,
+      "loss": 1.4533,
+      "step": 63
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.952878496303274e-06,
+      "loss": 1.4632,
+      "step": 64
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 9.950301252541824e-06,
+      "loss": 1.3781,
+      "step": 65
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.94765574502927e-06,
+      "loss": 1.4568,
+      "step": 66
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.944942010246681e-06,
+      "loss": 1.4398,
+      "step": 67
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.942160085615963e-06,
+      "loss": 1.4723,
+      "step": 68
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.939310009499348e-06,
+      "loss": 1.4004,
+      "step": 69
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.936391821198868e-06,
+      "loss": 1.3648,
+      "step": 70
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.933405560955805e-06,
+      "loss": 1.4415,
+      "step": 71
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.930351269950144e-06,
+      "loss": 1.4612,
+      "step": 72
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.9272289903e-06,
+      "loss": 1.4325,
+      "step": 73
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 9.924038765061042e-06,
+      "loss": 1.5181,
+      "step": 74
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.92078063822589e-06,
+      "loss": 1.3426,
+      "step": 75
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.917454654723522e-06,
+      "loss": 1.3409,
+      "step": 76
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.914060860418644e-06,
+      "loss": 1.3939,
+      "step": 77
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.910599302111057e-06,
+      "loss": 1.4446,
+      "step": 78
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.907070027535022e-06,
+      "loss": 1.4905,
+      "step": 79
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.903473085358589e-06,
+      "loss": 1.3647,
+      "step": 80
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.899808525182935e-06,
+      "loss": 1.4727,
+      "step": 81
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 9.896076397541676e-06,
+      "loss": 1.4646,
+      "step": 82
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.892276753900173e-06,
+      "loss": 1.3373,
+      "step": 83
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.888409646654818e-06,
+      "loss": 1.5058,
+      "step": 84
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.884475129132312e-06,
+      "loss": 1.4937,
+      "step": 85
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.880473255588937e-06,
+      "loss": 1.4754,
+      "step": 86
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.876404081209796e-06,
+      "loss": 1.4366,
+      "step": 87
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.872267662108064e-06,
+      "loss": 1.4724,
+      "step": 88
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.868064055324204e-06,
+      "loss": 1.4958,
+      "step": 89
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.863793318825186e-06,
+      "loss": 1.4369,
+      "step": 90
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 9.859455511503691e-06,
+      "loss": 1.4184,
+      "step": 91
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.855050693177286e-06,
+      "loss": 1.3802,
+      "step": 92
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.850578924587614e-06,
+      "loss": 1.4848,
+      "step": 93
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.846040267399548e-06,
+      "loss": 1.5789,
+      "step": 94
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.841434784200341e-06,
+      "loss": 1.3857,
+      "step": 95
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.83676253849877e-06,
+      "loss": 1.418,
+      "step": 96
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.832023594724248e-06,
+      "loss": 1.4117,
+      "step": 97
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.827218018225944e-06,
+      "loss": 1.5229,
+      "step": 98
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.822345875271884e-06,
+      "loss": 1.4827,
+      "step": 99
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 9.817407233048028e-06,
+      "loss": 1.3705,
+      "step": 100
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.812402159657352e-06,
+      "loss": 1.4836,
+      "step": 101
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.807330724118906e-06,
+      "loss": 1.4261,
+      "step": 102
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.802192996366859e-06,
+      "loss": 1.3848,
+      "step": 103
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.796989047249539e-06,
+      "loss": 1.3516,
+      "step": 104
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.791718948528457e-06,
+      "loss": 1.4449,
+      "step": 105
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.786382772877312e-06,
+      "loss": 1.4151,
+      "step": 106
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.780980593880993e-06,
+      "loss": 1.4627,
+      "step": 107
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.775512486034564e-06,
+      "loss": 1.3104,
+      "step": 108
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 9.76997852474223e-06,
+      "loss": 1.3921,
+      "step": 109
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.76437878631631e-06,
+      "loss": 1.4407,
+      "step": 110
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.758713347976179e-06,
+      "loss": 1.5017,
+      "step": 111
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.752982287847193e-06,
+      "loss": 1.4565,
+      "step": 112
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.747185684959626e-06,
+      "loss": 1.4413,
+      "step": 113
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.741323619247575e-06,
+      "loss": 1.3756,
+      "step": 114
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.735396171547859e-06,
+      "loss": 1.3519,
+      "step": 115
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.7294034235989e-06,
+      "loss": 1.4479,
+      "step": 116
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 9.723345458039595e-06,
+      "loss": 1.4406,
+      "step": 117
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.717222358408188e-06,
+      "loss": 1.4624,
+      "step": 118
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.711034209141102e-06,
+      "loss": 1.4615,
+      "step": 119
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.704781095571788e-06,
+      "loss": 1.4321,
+      "step": 120
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.698463103929542e-06,
+      "loss": 1.4256,
+      "step": 121
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.692080321338317e-06,
+      "loss": 1.4684,
+      "step": 122
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.685632835815519e-06,
+      "loss": 1.3634,
+      "step": 123
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.679120736270796e-06,
+      "loss": 1.5006,
+      "step": 124
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.672544112504813e-06,
+      "loss": 1.3559,
+      "step": 125
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 9.665903055208013e-06,
+      "loss": 1.3583,
+      "step": 126
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.659197655959364e-06,
+      "loss": 1.3658,
+      "step": 127
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.6524280072251e-06,
+      "loss": 1.3334,
+      "step": 128
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.645594202357438e-06,
+      "loss": 1.408,
+      "step": 129
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.638696335593304e-06,
+      "loss": 1.4738,
+      "step": 130
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.63173450205302e-06,
+      "loss": 1.4019,
+      "step": 131
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.624708797739002e-06,
+      "loss": 1.3439,
+      "step": 132
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.617619319534427e-06,
+      "loss": 1.4233,
+      "step": 133
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.610466165201912e-06,
+      "loss": 1.329,
+      "step": 134
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 9.603249433382145e-06,
+      "loss": 1.3863,
+      "step": 135
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.595969223592544e-06,
+      "loss": 1.3281,
+      "step": 136
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.588625636225871e-06,
+      "loss": 1.443,
+      "step": 137
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.58121877254886e-06,
+      "loss": 1.3968,
+      "step": 138
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.573748734700806e-06,
+      "loss": 1.3448,
+      "step": 139
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.566215625692168e-06,
+      "loss": 1.4587,
+      "step": 140
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.558619549403148e-06,
+      "loss": 1.4847,
+      "step": 141
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.550960610582251e-06,
+      "loss": 1.3366,
+      "step": 142
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.543238914844844e-06,
+      "loss": 1.4567,
+      "step": 143
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 9.535454568671705e-06,
+      "loss": 1.3963,
+      "step": 144
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.527607679407545e-06,
+      "loss": 1.3651,
+      "step": 145
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.519698355259537e-06,
+      "loss": 1.3715,
+      "step": 146
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.51172670529582e-06,
+      "loss": 1.4307,
+      "step": 147
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.503692839443988e-06,
+      "loss": 1.4342,
+      "step": 148
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.495596868489588e-06,
+      "loss": 1.3459,
+      "step": 149
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.487438904074581e-06,
+      "loss": 1.3055,
+      "step": 150
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.47921905869581e-06,
+      "loss": 1.3297,
+      "step": 151
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 9.47093744570344e-06,
+      "loss": 1.3391,
+      "step": 152
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.462594179299408e-06,
+      "loss": 1.3502,
+      "step": 153
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.45418937453583e-06,
+      "loss": 1.4598,
+      "step": 154
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.445723147313434e-06,
+      "loss": 1.3428,
+      "step": 155
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.437195614379947e-06,
+      "loss": 1.3898,
+      "step": 156
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.428606893328493e-06,
+      "loss": 1.352,
+      "step": 157
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.41995710259597e-06,
+      "loss": 1.4388,
+      "step": 158
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.41124636146141e-06,
+      "loss": 1.3507,
+      "step": 159
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.402474790044348e-06,
+      "loss": 1.3915,
+      "step": 160
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 9.39364250930315e-06,
+      "loss": 1.4424,
+      "step": 161
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.384749641033358e-06,
+      "loss": 1.3962,
+      "step": 162
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.375796307866003e-06,
+      "loss": 1.2982,
+      "step": 163
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.366782633265917e-06,
+      "loss": 1.3802,
+      "step": 164
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.357708741530025e-06,
+      "loss": 1.4136,
+      "step": 165
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.348574757785642e-06,
+      "loss": 1.3946,
+      "step": 166
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.339380807988734e-06,
+      "loss": 1.2952,
+      "step": 167
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.330127018922195e-06,
+      "loss": 1.3998,
+      "step": 168
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.320813518194084e-06,
+      "loss": 1.3156,
+      "step": 169
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 9.311440434235879e-06,
+      "loss": 1.4532,
+      "step": 170
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.302007896300697e-06,
+      "loss": 1.3771,
+      "step": 171
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.292516034461517e-06,
+      "loss": 1.4247,
+      "step": 172
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.28296497960938e-06,
+      "loss": 1.3896,
+      "step": 173
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.273354863451589e-06,
+      "loss": 1.4133,
+      "step": 174
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.263685818509895e-06,
+      "loss": 1.3593,
+      "step": 175
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.253957978118664e-06,
+      "loss": 1.3283,
+      "step": 176
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.244171476423037e-06,
+      "loss": 1.3792,
+      "step": 177
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.234326448377089e-06,
+      "loss": 1.3244,
+      "step": 178
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.22442302974196e-06,
+      "loss": 1.3083,
+      "step": 179
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.214461357083986e-06,
+      "loss": 1.3197,
+      "step": 180
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.204441567772817e-06,
+      "loss": 1.4067,
+      "step": 181
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.194363799979517e-06,
+      "loss": 1.3608,
+      "step": 182
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.184228192674667e-06,
+      "loss": 1.3958,
+      "step": 183
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.17403488562644e-06,
+      "loss": 1.3592,
+      "step": 184
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.163784019398686e-06,
+      "loss": 1.362,
+      "step": 185
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.153475735348973e-06,
+      "loss": 1.303,
+      "step": 186
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 9.143110175626662e-06,
+      "loss": 1.3781,
+      "step": 187
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.13268748317093e-06,
+      "loss": 1.4138,
+      "step": 188
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.122207801708802e-06,
+      "loss": 1.438,
+      "step": 189
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.111671275753175e-06,
+      "loss": 1.4004,
+      "step": 190
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.101078050600823e-06,
+      "loss": 1.3989,
+      "step": 191
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.090428272330381e-06,
+      "loss": 1.4085,
+      "step": 192
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.079722087800353e-06,
+      "loss": 1.3661,
+      "step": 193
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.06895964464707e-06,
+      "loss": 1.3699,
+      "step": 194
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.058141091282656e-06,
+      "loss": 1.3042,
+      "step": 195
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 9.047266576892993e-06,
+      "loss": 1.2713,
+      "step": 196
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 9.036336251435647e-06,
+      "loss": 1.3376,
+      "step": 197
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 9.025350265637816e-06,
+      "loss": 1.3499,
+      "step": 198
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 9.014308770994235e-06,
+      "loss": 1.3658,
+      "step": 199
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 9.003211919765102e-06,
+      "loss": 1.331,
+      "step": 200
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 8.992059864973972e-06,
+      "loss": 1.2886,
+      "step": 201
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 8.980852760405645e-06,
+      "loss": 1.3809,
+      "step": 202
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 8.96959076060405e-06,
+      "loss": 1.3197,
+      "step": 203
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 8.958274020870107e-06,
+      "loss": 1.4306,
+      "step": 204
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 8.946902697259593e-06,
+      "loss": 1.3622,
+      "step": 205
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.935476946580988e-06,
+      "loss": 1.3956,
+      "step": 206
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.923996926393306e-06,
+      "loss": 1.3504,
+      "step": 207
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.912462795003932e-06,
+      "loss": 1.3969,
+      "step": 208
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.900874711466436e-06,
+      "loss": 1.4044,
+      "step": 209
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.889232835578372e-06,
+      "loss": 1.3154,
+      "step": 210
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.877537327879087e-06,
+      "loss": 1.3014,
+      "step": 211
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.865788349647496e-06,
+      "loss": 1.3628,
+      "step": 212
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 8.853986062899869e-06,
+      "loss": 1.3848,
+      "step": 213
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.842130630387583e-06,
+      "loss": 1.382,
+      "step": 214
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.83022221559489e-06,
+      "loss": 1.3952,
+      "step": 215
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.818260982736662e-06,
+      "loss": 1.3529,
+      "step": 216
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.80624709675611e-06,
+      "loss": 1.2393,
+      "step": 217
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.794180723322537e-06,
+      "loss": 1.4167,
+      "step": 218
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.782062028829028e-06,
+      "loss": 1.3627,
+      "step": 219
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.769891180390168e-06,
+      "loss": 1.374,
+      "step": 220
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.757668345839739e-06,
+      "loss": 1.385,
+      "step": 221
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 8.745393693728395e-06,
+      "loss": 1.3289,
+      "step": 222
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.733067393321354e-06,
+      "loss": 1.3307,
+      "step": 223
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.72068961459605e-06,
+      "loss": 1.3902,
+      "step": 224
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.708260528239788e-06,
+      "loss": 1.4119,
+      "step": 225
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.695780305647405e-06,
+      "loss": 1.3628,
+      "step": 226
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.683249118918895e-06,
+      "loss": 1.2731,
+      "step": 227
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.670667140857034e-06,
+      "loss": 1.3873,
+      "step": 228
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.658034544965003e-06,
+      "loss": 1.3426,
+      "step": 229
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.645351505443997e-06,
+      "loss": 1.3858,
+      "step": 230
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 8.632618197190817e-06,
+      "loss": 1.4124,
+      "step": 231
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.619834795795458e-06,
+      "loss": 1.2791,
+      "step": 232
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.607001477538697e-06,
+      "loss": 1.3503,
+      "step": 233
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.594118419389648e-06,
+      "loss": 1.4009,
+      "step": 234
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.581185799003334e-06,
+      "loss": 1.3875,
+      "step": 235
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.568203794718228e-06,
+      "loss": 1.212,
+      "step": 236
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.555172585553804e-06,
+      "loss": 1.4082,
+      "step": 237
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.542092351208058e-06,
+      "loss": 1.3676,
+      "step": 238
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.528963272055036e-06,
+      "loss": 1.3153,
+      "step": 239
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 8.515785529142339e-06,
+      "loss": 1.3265,
+      "step": 240
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.502559304188644e-06,
+      "loss": 1.2758,
+      "step": 241
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.489284779581179e-06,
+      "loss": 1.365,
+      "step": 242
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.475962138373212e-06,
+      "loss": 1.3663,
+      "step": 243
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.46259156428154e-06,
+      "loss": 1.3429,
+      "step": 244
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.449173241683934e-06,
+      "loss": 1.3457,
+      "step": 245
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.43570735561662e-06,
+      "loss": 1.3705,
+      "step": 246
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.422194091771709e-06,
+      "loss": 1.2759,
+      "step": 247
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 8.408633636494643e-06,
+      "loss": 1.2924,
+      "step": 248
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.395026176781627e-06,
+      "loss": 1.3508,
+      "step": 249
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.381371900277045e-06,
+      "loss": 1.3227,
+      "step": 250
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.367670995270883e-06,
+      "loss": 1.3489,
+      "step": 251
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.353923650696119e-06,
+      "loss": 1.4203,
+      "step": 252
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.340130056126126e-06,
+      "loss": 1.3484,
+      "step": 253
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.326290401772057e-06,
+      "loss": 1.2958,
+      "step": 254
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.312404878480222e-06,
+      "loss": 1.3201,
+      "step": 255
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.298473677729453e-06,
+      "loss": 1.2972,
+      "step": 256
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 8.284496991628465e-06,
+      "loss": 1.3441,
+      "step": 257
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.270475012913212e-06,
+      "loss": 1.2578,
+      "step": 258
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.25640793494422e-06,
+      "loss": 1.2723,
+      "step": 259
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.24229595170393e-06,
+      "loss": 1.3463,
+      "step": 260
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.228139257794012e-06,
+      "loss": 1.2866,
+      "step": 261
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.213938048432697e-06,
+      "loss": 1.2462,
+      "step": 262
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.19969251945207e-06,
+      "loss": 1.2395,
+      "step": 263
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.185402867295373e-06,
+      "loss": 1.3298,
+      "step": 264
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.171069289014307e-06,
+      "loss": 1.3464,
+      "step": 265
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 8.156691982266299e-06,
+      "loss": 1.3382,
+      "step": 266
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.142271145311784e-06,
+      "loss": 1.3392,
+      "step": 267
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.127806977011476e-06,
+      "loss": 1.3726,
+      "step": 268
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.113299676823614e-06,
+      "loss": 1.3787,
+      "step": 269
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.098749444801226e-06,
+      "loss": 1.3239,
+      "step": 270
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.08415648158935e-06,
+      "loss": 1.2717,
+      "step": 271
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.069520988422292e-06,
+      "loss": 1.2725,
+      "step": 272
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.054843167120827e-06,
+      "loss": 1.2724,
+      "step": 273
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 8.040123220089437e-06,
+      "loss": 1.2731,
+      "step": 274
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 8.025361350313506e-06,
+      "loss": 1.3407,
+      "step": 275
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 8.010557761356523e-06,
+      "loss": 1.3206,
+      "step": 276
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.99571265735728e-06,
+      "loss": 1.3384,
+      "step": 277
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.980826243027052e-06,
+      "loss": 1.3022,
+      "step": 278
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.965898723646777e-06,
+      "loss": 1.4177,
+      "step": 279
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.950930305064224e-06,
+      "loss": 1.2906,
+      "step": 280
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.935921193691153e-06,
+      "loss": 1.3392,
+      "step": 281
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.920871596500473e-06,
+      "loss": 1.3754,
+      "step": 282
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 7.905781721023384e-06,
+      "loss": 1.3237,
+      "step": 283
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.890651775346512e-06,
+      "loss": 1.3333,
+      "step": 284
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.875481968109052e-06,
+      "loss": 1.329,
+      "step": 285
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.860272508499877e-06,
+      "loss": 1.3569,
+      "step": 286
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.845023606254658e-06,
+      "loss": 1.3997,
+      "step": 287
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.829735471652978e-06,
+      "loss": 1.2569,
+      "step": 288
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.814408315515419e-06,
+      "loss": 1.3157,
+      "step": 289
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.799042349200672e-06,
+      "loss": 1.3385,
+      "step": 290
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.783637784602608e-06,
+      "loss": 1.3519,
+      "step": 291
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 7.768194834147362e-06,
+      "loss": 1.3092,
+      "step": 292
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.752713710790405e-06,
+      "loss": 1.4118,
+      "step": 293
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.7371946280136e-06,
+      "loss": 1.4384,
+      "step": 294
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.721637799822269e-06,
+      "loss": 1.319,
+      "step": 295
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.706043440742235e-06,
+      "loss": 1.4091,
+      "step": 296
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.690411765816864e-06,
+      "loss": 1.3586,
+      "step": 297
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.674742990604101e-06,
+      "loss": 1.3887,
+      "step": 298
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.659037331173498e-06,
+      "loss": 1.4584,
+      "step": 299
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.643295004103232e-06,
+      "loss": 1.3274,
+      "step": 300
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 7.627516226477123e-06,
+      "loss": 1.3145,
+      "step": 301
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.611701215881635e-06,
+      "loss": 1.3378,
+      "step": 302
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.595850190402877e-06,
+      "loss": 1.3808,
+      "step": 303
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.579963368623602e-06,
+      "loss": 1.2816,
+      "step": 304
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.564040969620179e-06,
+      "loss": 1.3451,
+      "step": 305
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.548083212959588e-06,
+      "loss": 1.3767,
+      "step": 306
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.532090318696382e-06,
+      "loss": 1.2972,
+      "step": 307
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.516062507369655e-06,
+      "loss": 1.3871,
+      "step": 308
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 7.500000000000001e-06,
+      "loss": 1.2995,
+      "step": 309
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.483903018086466e-06,
+      "loss": 1.3784,
+      "step": 310
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.467771783603492e-06,
+      "loss": 1.2722,
+      "step": 311
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.4516065189978625e-06,
+      "loss": 1.2584,
+      "step": 312
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.435407447185623e-06,
+      "loss": 1.3558,
+      "step": 313
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.419174791549023e-06,
+      "loss": 1.3191,
+      "step": 314
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.402908775933419e-06,
+      "loss": 1.3213,
+      "step": 315
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.386609624644201e-06,
+      "loss": 1.2918,
+      "step": 316
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.370277562443689e-06,
+      "loss": 1.2747,
+      "step": 317
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 7.353912814548042e-06,
+      "loss": 1.3291,
+      "step": 318
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.337515606624148e-06,
+      "loss": 1.3367,
+      "step": 319
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.321086164786513e-06,
+      "loss": 1.3425,
+      "step": 320
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.30462471559414e-06,
+      "loss": 1.2055,
+      "step": 321
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.288131486047414e-06,
+      "loss": 1.2522,
+      "step": 322
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.2716067035849595e-06,
+      "loss": 1.3425,
+      "step": 323
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.25505059608051e-06,
+      "loss": 1.3494,
+      "step": 324
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.23846339183977e-06,
+      "loss": 1.3205,
+      "step": 325
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.221845319597258e-06,
+      "loss": 1.3643,
+      "step": 326
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 7.2051966085131584e-06,
+      "loss": 1.2947,
+      "step": 327
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.18851748817016e-06,
+      "loss": 1.2807,
+      "step": 328
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.1718081885702905e-06,
+      "loss": 1.2995,
+      "step": 329
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.155068940131741e-06,
+      "loss": 1.2976,
+      "step": 330
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.138299973685694e-06,
+      "loss": 1.3224,
+      "step": 331
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.121501520473137e-06,
+      "loss": 1.3198,
+      "step": 332
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.104673812141676e-06,
+      "loss": 1.2902,
+      "step": 333
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.087817080742337e-06,
+      "loss": 1.2266,
+      "step": 334
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.070931558726373e-06,
+      "loss": 1.1909,
+      "step": 335
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 7.054017478942048e-06,
+      "loss": 1.2823,
+      "step": 336
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 7.037075074631441e-06,
+      "loss": 1.3414,
+      "step": 337
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 7.0201045794272135e-06,
+      "loss": 1.2356,
+      "step": 338
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 7.003106227349399e-06,
+      "loss": 1.3991,
+      "step": 339
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 6.9860802528021705e-06,
+      "loss": 1.3584,
+      "step": 340
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 6.969026890570612e-06,
+      "loss": 1.391,
+      "step": 341
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 6.9519463758174745e-06,
+      "loss": 1.2975,
+      "step": 342
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 6.934838944079944e-06,
+      "loss": 1.2221,
+      "step": 343
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 6.917704831266381e-06,
+      "loss": 1.266,
+      "step": 344
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.9005442736530745e-06,
+      "loss": 1.2379,
+      "step": 345
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.883357507880985e-06,
+      "loss": 1.2859,
+      "step": 346
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.866144770952474e-06,
+      "loss": 1.3748,
+      "step": 347
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.848906300228047e-06,
+      "loss": 1.2665,
+      "step": 348
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.831642333423068e-06,
+      "loss": 1.3045,
+      "step": 349
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.814353108604488e-06,
+      "loss": 1.3242,
+      "step": 350
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.797038864187564e-06,
+      "loss": 1.3638,
+      "step": 351
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.77969983893257e-06,
+      "loss": 1.3131,
+      "step": 352
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 6.762336271941499e-06,
+      "loss": 1.265,
+      "step": 353
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.7449484026547705e-06,
+      "loss": 1.3479,
+      "step": 354
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.7275364708479316e-06,
+      "loss": 1.3901,
+      "step": 355
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.710100716628345e-06,
+      "loss": 1.3811,
+      "step": 356
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.692641380431879e-06,
+      "loss": 1.303,
+      "step": 357
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.675158703019594e-06,
+      "loss": 1.2914,
+      "step": 358
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.657652925474424e-06,
+      "loss": 1.2605,
+      "step": 359
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.640124289197845e-06,
+      "loss": 1.2949,
+      "step": 360
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.622573035906557e-06,
+      "loss": 1.2792,
+      "step": 361
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 6.604999407629137e-06,
+      "loss": 1.2529,
+      "step": 362
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.5874036467027135e-06,
+      "loss": 1.2953,
+      "step": 363
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.5697859957696195e-06,
+      "loss": 1.3389,
+      "step": 364
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.552146697774049e-06,
+      "loss": 1.3,
+      "step": 365
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.534485995958699e-06,
+      "loss": 1.3147,
+      "step": 366
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.51680413386143e-06,
+      "loss": 1.2693,
+      "step": 367
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.499101355311891e-06,
+      "loss": 1.2744,
+      "step": 368
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.481377904428171e-06,
+      "loss": 1.3277,
+      "step": 369
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.4636340256134224e-06,
+      "loss": 1.3469,
+      "step": 370
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 6.445869963552496e-06,
+      "loss": 1.3045,
+      "step": 371
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.428085963208567e-06,
+      "loss": 1.2607,
+      "step": 372
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.410282269819756e-06,
+      "loss": 1.2707,
+      "step": 373
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.392459128895747e-06,
+      "loss": 1.2619,
+      "step": 374
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.374616786214402e-06,
+      "loss": 1.2985,
+      "step": 375
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.356755487818371e-06,
+      "loss": 1.3691,
+      "step": 376
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.338875480011698e-06,
+      "loss": 1.3009,
+      "step": 377
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.3209770093564315e-06,
+      "loss": 1.3129,
+      "step": 378
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 6.303060322669214e-06,
+      "loss": 1.2962,
+      "step": 379
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.285125667017886e-06,
+      "loss": 1.3026,
+      "step": 380
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.267173289718079e-06,
+      "loss": 1.2494,
+      "step": 381
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.249203438329799e-06,
+      "loss": 1.2837,
+      "step": 382
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.23121636065402e-06,
+      "loss": 1.2899,
+      "step": 383
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.213212304729259e-06,
+      "loss": 1.2559,
+      "step": 384
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.195191518828163e-06,
+      "loss": 1.2479,
+      "step": 385
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.177154251454082e-06,
+      "loss": 1.3153,
+      "step": 386
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.1591007513376425e-06,
+      "loss": 1.3329,
+      "step": 387
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 6.141031267433316e-06,
+      "loss": 1.2713,
+      "step": 388
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.122946048915991e-06,
+      "loss": 1.3049,
+      "step": 389
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.1048453451775305e-06,
+      "loss": 1.3078,
+      "step": 390
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.086729405823335e-06,
+      "loss": 1.2158,
+      "step": 391
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.0685984806689055e-06,
+      "loss": 1.2278,
+      "step": 392
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.05045281973639e-06,
+      "loss": 1.2554,
+      "step": 393
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.032292673251143e-06,
+      "loss": 1.257,
+      "step": 394
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 6.014118291638272e-06,
+      "loss": 1.2632,
+      "step": 395
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 5.995929925519181e-06,
+      "loss": 1.2637,
+      "step": 396
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 5.977727825708123e-06,
+      "loss": 1.4115,
+      "step": 397
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.959512243208732e-06,
+      "loss": 1.3101,
+      "step": 398
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.941283429210568e-06,
+      "loss": 1.2819,
+      "step": 399
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.9230416350856505e-06,
+      "loss": 1.1701,
+      "step": 400
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.904787112384991e-06,
+      "loss": 1.3451,
+      "step": 401
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.886520112835128e-06,
+      "loss": 1.294,
+      "step": 402
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.8682408883346535e-06,
+      "loss": 1.289,
+      "step": 403
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.849949690950736e-06,
+      "loss": 1.3007,
+      "step": 404
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 5.831646772915651e-06,
+      "loss": 1.2988,
+      "step": 405
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.8133323866233005e-06,
+      "loss": 1.3402,
+      "step": 406
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.795006784625728e-06,
+      "loss": 1.2917,
+      "step": 407
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.776670219629643e-06,
+      "loss": 1.2742,
+      "step": 408
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.75832294449293e-06,
+      "loss": 1.1589,
+      "step": 409
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.739965212221168e-06,
+      "loss": 1.2647,
+      "step": 410
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.7215972759641335e-06,
+      "loss": 1.2089,
+      "step": 411
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.703219389012317e-06,
+      "loss": 1.2267,
+      "step": 412
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.684831804793427e-06,
+      "loss": 1.2323,
+      "step": 413
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 5.666434776868895e-06,
+      "loss": 1.2575,
+      "step": 414
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.64802855893038e-06,
+      "loss": 1.2609,
+      "step": 415
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.629613404796267e-06,
+      "loss": 1.2414,
+      "step": 416
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.611189568408173e-06,
+      "loss": 1.2782,
+      "step": 417
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.592757303827441e-06,
+      "loss": 1.2637,
+      "step": 418
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.574316865231637e-06,
+      "loss": 1.2786,
+      "step": 419
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.5558685069110444e-06,
+      "loss": 1.2729,
+      "step": 420
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.537412483265156e-06,
+      "loss": 1.2916,
+      "step": 421
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.518949048799176e-06,
+      "loss": 1.2674,
+      "step": 422
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 5.500478458120493e-06,
+      "loss": 1.2308,
+      "step": 423
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.482000965935182e-06,
+      "loss": 1.3994,
+      "step": 424
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.463516827044492e-06,
+      "loss": 1.2365,
+      "step": 425
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.445026296341325e-06,
+      "loss": 1.3897,
+      "step": 426
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.4265296288067235e-06,
+      "loss": 1.3042,
+      "step": 427
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.408027079506362e-06,
+      "loss": 1.2559,
+      "step": 428
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.389518903587016e-06,
+      "loss": 1.2159,
+      "step": 429
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.371005356273058e-06,
+      "loss": 1.3298,
+      "step": 430
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.352486692862926e-06,
+      "loss": 1.2856,
+      "step": 431
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 5.3339631687256085e-06,
+      "loss": 1.3069,
+      "step": 432
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.3154350392971245e-06,
+      "loss": 1.2573,
+      "step": 433
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.296902560077e-06,
+      "loss": 1.322,
+      "step": 434
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.278365986624743e-06,
+      "loss": 1.3452,
+      "step": 435
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.259825574556315e-06,
+      "loss": 1.2812,
+      "step": 436
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.241281579540619e-06,
+      "loss": 1.2764,
+      "step": 437
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.222734257295963e-06,
+      "loss": 1.2183,
+      "step": 438
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.2041838635865336e-06,
+      "loss": 1.4043,
+      "step": 439
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 5.1856306542188805e-06,
+      "loss": 1.3147,
+      "step": 440
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.1670748850383734e-06,
+      "loss": 1.2799,
+      "step": 441
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.148516811925684e-06,
+      "loss": 1.2555,
+      "step": 442
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.129956690793255e-06,
+      "loss": 1.2698,
+      "step": 443
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.111394777581769e-06,
+      "loss": 1.3233,
+      "step": 444
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.0928313282566255e-06,
+      "loss": 1.2816,
+      "step": 445
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.074266598804402e-06,
+      "loss": 1.2227,
+      "step": 446
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.0557008452293275e-06,
+      "loss": 1.2858,
+      "step": 447
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.037134323549763e-06,
+      "loss": 1.2475,
+      "step": 448
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 5.0185672897946515e-06,
+      "loss": 1.2544,
+      "step": 449
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 5e-06,
+      "loss": 1.3282,
+      "step": 450
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.981432710205351e-06,
+      "loss": 1.2523,
+      "step": 451
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.962865676450239e-06,
+      "loss": 1.2541,
+      "step": 452
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.944299154770673e-06,
+      "loss": 1.3198,
+      "step": 453
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.925733401195601e-06,
+      "loss": 1.3546,
+      "step": 454
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.907168671743377e-06,
+      "loss": 1.2943,
+      "step": 455
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.888605222418232e-06,
+      "loss": 1.2895,
+      "step": 456
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.8700433092067474e-06,
+      "loss": 1.3898,
+      "step": 457
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.8514831880743175e-06,
+      "loss": 1.2536,
+      "step": 458
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.832925114961629e-06,
+      "loss": 1.2977,
+      "step": 459
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.814369345781121e-06,
+      "loss": 1.2635,
+      "step": 460
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.795816136413467e-06,
+      "loss": 1.442,
+      "step": 461
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.777265742704039e-06,
+      "loss": 1.2602,
+      "step": 462
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.758718420459383e-06,
+      "loss": 1.3203,
+      "step": 463
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.740174425443687e-06,
+      "loss": 1.2639,
+      "step": 464
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.7216340133752604e-06,
+      "loss": 1.2657,
+      "step": 465
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.703097439923e-06,
+      "loss": 1.2655,
+      "step": 466
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.684564960702877e-06,
+      "loss": 1.287,
+      "step": 467
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.666036831274392e-06,
+      "loss": 1.2951,
+      "step": 468
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.647513307137076e-06,
+      "loss": 1.2133,
+      "step": 469
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.628994643726942e-06,
+      "loss": 1.3232,
+      "step": 470
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.610481096412985e-06,
+      "loss": 1.2329,
+      "step": 471
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.591972920493638e-06,
+      "loss": 1.259,
+      "step": 472
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.573470371193277e-06,
+      "loss": 1.2908,
+      "step": 473
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.554973703658676e-06,
+      "loss": 1.3604,
+      "step": 474
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.53648317295551e-06,
+      "loss": 1.296,
+      "step": 475
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.517999034064819e-06,
+      "loss": 1.3323,
+      "step": 476
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.499521541879508e-06,
+      "loss": 1.2711,
+      "step": 477
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.4810509512008245e-06,
+      "loss": 1.1972,
+      "step": 478
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.462587516734845e-06,
+      "loss": 1.276,
+      "step": 479
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.444131493088956e-06,
+      "loss": 1.3355,
+      "step": 480
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.425683134768365e-06,
+      "loss": 1.3096,
+      "step": 481
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.40724269617256e-06,
+      "loss": 1.2206,
+      "step": 482
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.388810431591829e-06,
+      "loss": 1.2797,
+      "step": 483
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.3703865952037354e-06,
+      "loss": 1.3047,
+      "step": 484
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.351971441069622e-06,
+      "loss": 1.3317,
+      "step": 485
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.333565223131107e-06,
+      "loss": 1.1875,
+      "step": 486
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.315168195206574e-06,
+      "loss": 1.2847,
+      "step": 487
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.296780610987685e-06,
+      "loss": 1.1669,
+      "step": 488
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.278402724035868e-06,
+      "loss": 1.2493,
+      "step": 489
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.260034787778833e-06,
+      "loss": 1.2311,
+      "step": 490
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.241677055507071e-06,
+      "loss": 1.2648,
+      "step": 491
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.223329780370359e-06,
+      "loss": 1.2404,
+      "step": 492
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.2049932153742726e-06,
+      "loss": 1.2094,
+      "step": 493
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.186667613376702e-06,
+      "loss": 1.303,
+      "step": 494
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.1683532270843505e-06,
+      "loss": 1.2948,
+      "step": 495
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.150050309049267e-06,
+      "loss": 1.2527,
+      "step": 496
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.131759111665349e-06,
+      "loss": 1.3017,
+      "step": 497
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.113479887164873e-06,
+      "loss": 1.2143,
+      "step": 498
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.09521288761501e-06,
+      "loss": 1.3148,
+      "step": 499
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.076958364914352e-06,
+      "loss": 1.3612,
+      "step": 500
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.0587165707894326e-06,
+      "loss": 1.1973,
+      "step": 501
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.04048775679127e-06,
+      "loss": 1.3197,
+      "step": 502
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.022272174291878e-06,
+      "loss": 1.2792,
+      "step": 503
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.004070074480821e-06,
+      "loss": 1.2688,
+      "step": 504
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 3.985881708361729e-06,
+      "loss": 1.2597,
+      "step": 505
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 3.967707326748857e-06,
+      "loss": 1.2096,
+      "step": 506
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 3.94954718026361e-06,
+      "loss": 1.3257,
+      "step": 507
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 3.931401519331095e-06,
+      "loss": 1.2747,
+      "step": 508
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 3.913270594176665e-06,
+      "loss": 1.2471,
+      "step": 509
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 3.895154654822471e-06,
+      "loss": 1.1345,
+      "step": 510
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.87705395108401e-06,
+      "loss": 1.2536,
+      "step": 511
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.858968732566685e-06,
+      "loss": 1.219,
+      "step": 512
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.840899248662358e-06,
+      "loss": 1.3678,
+      "step": 513
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.822845748545919e-06,
+      "loss": 1.2971,
+      "step": 514
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.8048084811718377e-06,
+      "loss": 1.2638,
+      "step": 515
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.786787695270743e-06,
+      "loss": 1.254,
+      "step": 516
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.7687836393459828e-06,
+      "loss": 1.2319,
+      "step": 517
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.7507965616702015e-06,
+      "loss": 1.2633,
+      "step": 518
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 3.732826710281923e-06,
+      "loss": 1.2397,
+      "step": 519
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.7148743329821146e-06,
+      "loss": 1.2457,
+      "step": 520
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.6969396773307888e-06,
+      "loss": 1.1932,
+      "step": 521
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.6790229906435706e-06,
+      "loss": 1.232,
+      "step": 522
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.6611245199883037e-06,
+      "loss": 1.2936,
+      "step": 523
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.6432445121816308e-06,
+      "loss": 1.2693,
+      "step": 524
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.6253832137856e-06,
+      "loss": 1.3185,
+      "step": 525
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.6075408711042536e-06,
+      "loss": 1.2861,
+      "step": 526
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.5897177301802455e-06,
+      "loss": 1.3201,
+      "step": 527
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 3.571914036791435e-06,
+      "loss": 1.2737,
+      "step": 528
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.5541300364475067e-06,
+      "loss": 1.2632,
+      "step": 529
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.5363659743865797e-06,
+      "loss": 1.2531,
+      "step": 530
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.518622095571831e-06,
+      "loss": 1.2531,
+      "step": 531
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.5008986446881088e-06,
+      "loss": 1.2296,
+      "step": 532
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.4831958661385716e-06,
+      "loss": 1.2491,
+      "step": 533
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.465514004041301e-06,
+      "loss": 1.242,
+      "step": 534
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.4478533022259527e-06,
+      "loss": 1.2202,
+      "step": 535
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 3.4302140042303813e-06,
+      "loss": 1.2563,
+      "step": 536
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.4125963532972878e-06,
+      "loss": 1.1436,
+      "step": 537
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.395000592370864e-06,
+      "loss": 1.3096,
+      "step": 538
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.3774269640934447e-06,
+      "loss": 1.3111,
+      "step": 539
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.3598757108021546e-06,
+      "loss": 1.237,
+      "step": 540
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.342347074525578e-06,
+      "loss": 1.2538,
+      "step": 541
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.3248412969804065e-06,
+      "loss": 1.1838,
+      "step": 542
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.307358619568123e-06,
+      "loss": 1.2605,
+      "step": 543
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.289899283371657e-06,
+      "loss": 1.2921,
+      "step": 544
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 3.2724635291520697e-06,
+      "loss": 1.2745,
+      "step": 545
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.2550515973452295e-06,
+      "loss": 1.2481,
+      "step": 546
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.2376637280585025e-06,
+      "loss": 1.2874,
+      "step": 547
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.2203001610674322e-06,
+      "loss": 1.2855,
+      "step": 548
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.202961135812437e-06,
+      "loss": 1.2453,
+      "step": 549
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.185646891395514e-06,
+      "loss": 1.2974,
+      "step": 550
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.1683576665769344e-06,
+      "loss": 1.2085,
+      "step": 551
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.1510936997719557e-06,
+      "loss": 1.192,
+      "step": 552
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.1338552290475265e-06,
+      "loss": 1.2529,
+      "step": 553
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 3.1166424921190174e-06,
+      "loss": 1.3066,
+      "step": 554
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 3.0994557263469267e-06,
+      "loss": 1.3012,
+      "step": 555
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 3.0822951687336215e-06,
+      "loss": 1.2765,
+      "step": 556
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 3.065161055920057e-06,
+      "loss": 1.1885,
+      "step": 557
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 3.0480536241825263e-06,
+      "loss": 1.2846,
+      "step": 558
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 3.03097310942939e-06,
+      "loss": 1.2697,
+      "step": 559
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 3.013919747197832e-06,
+      "loss": 1.2531,
+      "step": 560
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 2.996893772650602e-06,
+      "loss": 1.2579,
+      "step": 561
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 2.9798954205727886e-06,
+      "loss": 1.2048,
+      "step": 562
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 2.96292492536856e-06,
+      "loss": 1.1587,
+      "step": 563
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.9459825210579534e-06,
+      "loss": 1.2425,
+      "step": 564
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.929068441273629e-06,
+      "loss": 1.2004,
+      "step": 565
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.9121829192576647e-06,
+      "loss": 1.2269,
+      "step": 566
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.8953261878583263e-06,
+      "loss": 1.2296,
+      "step": 567
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.8784984795268644e-06,
+      "loss": 1.2511,
+      "step": 568
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.861700026314308e-06,
+      "loss": 1.2154,
+      "step": 569
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.844931059868261e-06,
+      "loss": 1.2472,
+      "step": 570
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 2.828191811429709e-06,
+      "loss": 1.2231,
+      "step": 571
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.811482511829842e-06,
+      "loss": 1.2338,
+      "step": 572
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.7948033914868415e-06,
+      "loss": 1.1874,
+      "step": 573
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.778154680402745e-06,
+      "loss": 1.1784,
+      "step": 574
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.7615366081602306e-06,
+      "loss": 1.2601,
+      "step": 575
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.74494940391949e-06,
+      "loss": 1.2372,
+      "step": 576
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.7283932964150417e-06,
+      "loss": 1.2243,
+      "step": 577
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.711868513952587e-06,
+      "loss": 1.2287,
+      "step": 578
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.69537528440586e-06,
+      "loss": 1.2571,
+      "step": 579
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 2.6789138352134885e-06,
+      "loss": 1.3302,
+      "step": 580
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.6624843933758547e-06,
+      "loss": 1.2508,
+      "step": 581
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.6460871854519594e-06,
+      "loss": 1.3273,
+      "step": 582
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.6297224375563126e-06,
+      "loss": 1.2187,
+      "step": 583
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.613390375355801e-06,
+      "loss": 1.2563,
+      "step": 584
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.5970912240665815e-06,
+      "loss": 1.2393,
+      "step": 585
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.5808252084509784e-06,
+      "loss": 1.2318,
+      "step": 586
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.5645925528143778e-06,
+      "loss": 1.2727,
+      "step": 587
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.54839348100214e-06,
+      "loss": 1.2413,
+      "step": 588
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 2.5322282163965096e-06,
+      "loss": 1.2198,
+      "step": 589
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.5160969819135368e-06,
+      "loss": 1.2257,
+      "step": 590
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.5000000000000015e-06,
+      "loss": 1.1988,
+      "step": 591
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.483937492630345e-06,
+      "loss": 1.2751,
+      "step": 592
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.4679096813036202e-06,
+      "loss": 1.2558,
+      "step": 593
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.4519167870404126e-06,
+      "loss": 1.2104,
+      "step": 594
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.4359590303798243e-06,
+      "loss": 1.263,
+      "step": 595
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.4200366313764e-06,
+      "loss": 1.3031,
+      "step": 596
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.4041498095971253e-06,
+      "loss": 1.3141,
+      "step": 597
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 2.388298784118366e-06,
+      "loss": 1.2409,
+      "step": 598
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.3724837735228773e-06,
+      "loss": 1.2706,
+      "step": 599
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.356704995896768e-06,
+      "loss": 1.3215,
+      "step": 600
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.340962668826503e-06,
+      "loss": 1.248,
+      "step": 601
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.3252570093959e-06,
+      "loss": 1.2072,
+      "step": 602
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.309588234183137e-06,
+      "loss": 1.2875,
+      "step": 603
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.293956559257766e-06,
+      "loss": 1.2191,
+      "step": 604
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.2783622001777322e-06,
+      "loss": 1.1606,
+      "step": 605
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 2.262805371986402e-06,
+      "loss": 1.3171,
+      "step": 606
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.247286289209597e-06,
+      "loss": 1.2794,
+      "step": 607
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.231805165852637e-06,
+      "loss": 1.2513,
+      "step": 608
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.216362215397393e-06,
+      "loss": 1.2643,
+      "step": 609
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.2009576507993273e-06,
+      "loss": 1.2275,
+      "step": 610
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.1855916844845827e-06,
+      "loss": 1.2788,
+      "step": 611
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.1702645283470238e-06,
+      "loss": 1.2536,
+      "step": 612
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.1549763937453445e-06,
+      "loss": 1.2581,
+      "step": 613
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.1397274915001254e-06,
+      "loss": 1.2904,
+      "step": 614
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 2.1245180318909482e-06,
+      "loss": 1.2637,
+      "step": 615
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.1093482246534896e-06,
+      "loss": 1.2643,
+      "step": 616
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.0942182789766174e-06,
+      "loss": 1.2461,
+      "step": 617
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.0791284034995296e-06,
+      "loss": 1.2346,
+      "step": 618
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.064078806308848e-06,
+      "loss": 1.2702,
+      "step": 619
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.0490696949357774e-06,
+      "loss": 1.2606,
+      "step": 620
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.0341012763532243e-06,
+      "loss": 1.2199,
+      "step": 621
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.0191737569729492e-06,
+      "loss": 1.2485,
+      "step": 622
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 2.004287342642721e-06,
+      "loss": 1.2254,
+      "step": 623
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 1.989442238643478e-06,
+      "loss": 1.1819,
+      "step": 624
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.974638649686495e-06,
+      "loss": 1.3182,
+      "step": 625
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.959876779910564e-06,
+      "loss": 1.2954,
+      "step": 626
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.945156832879174e-06,
+      "loss": 1.2334,
+      "step": 627
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.930479011577711e-06,
+      "loss": 1.1988,
+      "step": 628
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.91584351841065e-06,
+      "loss": 1.236,
+      "step": 629
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.9012505551987764e-06,
+      "loss": 1.3011,
+      "step": 630
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.8867003231763847e-06,
+      "loss": 1.2476,
+      "step": 631
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 1.872193022988526e-06,
+      "loss": 1.2689,
+      "step": 632
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.8577288546882167e-06,
+      "loss": 1.1812,
+      "step": 633
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.8433080177337043e-06,
+      "loss": 1.2817,
+      "step": 634
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.8289307109856941e-06,
+      "loss": 1.2886,
+      "step": 635
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.8145971327046274e-06,
+      "loss": 1.2686,
+      "step": 636
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.8003074805479314e-06,
+      "loss": 1.1913,
+      "step": 637
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.7860619515673034e-06,
+      "loss": 1.3015,
+      "step": 638
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.771860742205988e-06,
+      "loss": 1.3085,
+      "step": 639
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.7577040482960723e-06,
+      "loss": 1.2569,
+      "step": 640
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 1.7435920650557808e-06,
+      "loss": 1.3083,
+      "step": 641
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.7295249870867898e-06,
+      "loss": 1.2116,
+      "step": 642
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.7155030083715362e-06,
+      "loss": 1.1131,
+      "step": 643
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.7015263222705492e-06,
+      "loss": 1.212,
+      "step": 644
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.6875951215197779e-06,
+      "loss": 1.256,
+      "step": 645
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.6737095982279444e-06,
+      "loss": 1.291,
+      "step": 646
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.6598699438738764e-06,
+      "loss": 1.2236,
+      "step": 647
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.646076349303884e-06,
+      "loss": 1.157,
+      "step": 648
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.6323290047291196e-06,
+      "loss": 1.314,
+      "step": 649
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 1.618628099722957e-06,
+      "loss": 1.2609,
+      "step": 650
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.604973823218376e-06,
+      "loss": 1.2084,
+      "step": 651
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.5913663635053578e-06,
+      "loss": 1.2334,
+      "step": 652
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.5778059082282932e-06,
+      "loss": 1.2386,
+      "step": 653
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.56429264438338e-06,
+      "loss": 1.3237,
+      "step": 654
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.550826758316068e-06,
+      "loss": 1.2669,
+      "step": 655
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.5374084357184621e-06,
+      "loss": 1.1954,
+      "step": 656
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.5240378616267887e-06,
+      "loss": 1.2389,
+      "step": 657
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.510715220418823e-06,
+      "loss": 1.2534,
+      "step": 658
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 1.4974406958113557e-06,
+      "loss": 1.1756,
+      "step": 659
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.4842144708576606e-06,
+      "loss": 1.1699,
+      "step": 660
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.4710367279449662e-06,
+      "loss": 1.2823,
+      "step": 661
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.457907648791943e-06,
+      "loss": 1.2844,
+      "step": 662
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.4448274144461965e-06,
+      "loss": 1.2512,
+      "step": 663
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.431796205281773e-06,
+      "loss": 1.2727,
+      "step": 664
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.4188142009966689e-06,
+      "loss": 1.2276,
+      "step": 665
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.4058815806103542e-06,
+      "loss": 1.2762,
+      "step": 666
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 1.3929985224613051e-06,
+      "loss": 1.3201,
+      "step": 667
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.3801652042045416e-06,
+      "loss": 1.1758,
+      "step": 668
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.367381802809185e-06,
+      "loss": 1.2246,
+      "step": 669
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.3546484945560029e-06,
+      "loss": 1.1833,
+      "step": 670
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.3419654550349987e-06,
+      "loss": 1.1716,
+      "step": 671
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.329332859142967e-06,
+      "loss": 1.2747,
+      "step": 672
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.3167508810811058e-06,
+      "loss": 1.2786,
+      "step": 673
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.3042196943525942e-06,
+      "loss": 1.2917,
+      "step": 674
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.2917394717602123e-06,
+      "loss": 1.2502,
+      "step": 675
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 1.2793103854039518e-06,
+      "loss": 1.2029,
+      "step": 676
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.2669326066786458e-06,
+      "loss": 1.2311,
+      "step": 677
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.2546063062716069e-06,
+      "loss": 1.1958,
+      "step": 678
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.242331654160263e-06,
+      "loss": 1.313,
+      "step": 679
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.2301088196098332e-06,
+      "loss": 1.2361,
+      "step": 680
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.2179379711709738e-06,
+      "loss": 1.2073,
+      "step": 681
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.205819276677464e-06,
+      "loss": 1.2211,
+      "step": 682
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.1937529032438905e-06,
+      "loss": 1.2087,
+      "step": 683
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.1817390172633402e-06,
+      "loss": 1.2407,
+      "step": 684
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 1.1697777844051105e-06,
+      "loss": 1.201,
+      "step": 685
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.1578693696124193e-06,
+      "loss": 1.2321,
+      "step": 686
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.1460139371001339e-06,
+      "loss": 1.2191,
+      "step": 687
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.1342116503525059e-06,
+      "loss": 1.2591,
+      "step": 688
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.1224626721209141e-06,
+      "loss": 1.188,
+      "step": 689
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.1107671644216305e-06,
+      "loss": 1.2702,
+      "step": 690
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.0991252885335651e-06,
+      "loss": 1.2203,
+      "step": 691
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.0875372049960697e-06,
+      "loss": 1.235,
+      "step": 692
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.0760030736066952e-06,
+      "loss": 1.2553,
+      "step": 693
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 1.064523053419015e-06,
+      "loss": 1.1923,
+      "step": 694
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 1.0530973027404073e-06,
+      "loss": 1.2705,
+      "step": 695
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 1.041725979129894e-06,
+      "loss": 1.2714,
+      "step": 696
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 1.0304092393959513e-06,
+      "loss": 1.2139,
+      "step": 697
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 1.0191472395943552e-06,
+      "loss": 1.2619,
+      "step": 698
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 1.0079401350260288e-06,
+      "loss": 1.1787,
+      "step": 699
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 9.967880802348989e-07,
+      "loss": 1.19,
+      "step": 700
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 9.856912290057668e-07,
+      "loss": 1.2921,
+      "step": 701
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 9.746497343621857e-07,
+      "loss": 1.1421,
+      "step": 702
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 9.63663748564353e-07,
+      "loss": 1.24,
+      "step": 703
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 9.527334231070084e-07,
+      "loss": 1.2213,
+      "step": 704
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 9.418589087173441e-07,
+      "loss": 1.2802,
+      "step": 705
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 9.310403553529335e-07,
+      "loss": 1.1891,
+      "step": 706
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 9.20277912199648e-07,
+      "loss": 1.218,
+      "step": 707
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 9.095717276696214e-07,
+      "loss": 1.2472,
+      "step": 708
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 8.989219493991791e-07,
+      "loss": 1.1592,
+      "step": 709
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 8.883287242468242e-07,
+      "loss": 1.2343,
+      "step": 710
+    },
+    {
+      "epoch": 0.81,
+      "learning_rate": 8.777921982911996e-07,
+      "loss": 1.2461,
+      "step": 711
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.673125168290713e-07,
+      "loss": 1.226,
+      "step": 712
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.568898243733398e-07,
+      "loss": 1.2083,
+      "step": 713
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.46524264651028e-07,
+      "loss": 1.2606,
+      "step": 714
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.362159806013176e-07,
+      "loss": 1.2723,
+      "step": 715
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.259651143735603e-07,
+      "loss": 1.1759,
+      "step": 716
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.157718073253351e-07,
+      "loss": 1.3235,
+      "step": 717
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 8.056362000204848e-07,
+      "loss": 1.1881,
+      "step": 718
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 7.955584322271853e-07,
+      "loss": 1.2148,
+      "step": 719
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 7.85538642916015e-07,
+      "loss": 1.2402,
+      "step": 720
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.755769702580412e-07,
+      "loss": 1.2338,
+      "step": 721
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.656735516229125e-07,
+      "loss": 1.2621,
+      "step": 722
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.558285235769647e-07,
+      "loss": 1.1569,
+      "step": 723
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.46042021881338e-07,
+      "loss": 1.2019,
+      "step": 724
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.363141814901054e-07,
+      "loss": 1.2361,
+      "step": 725
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.266451365484106e-07,
+      "loss": 1.2144,
+      "step": 726
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.170350203906218e-07,
+      "loss": 1.2816,
+      "step": 727
+    },
+    {
+      "epoch": 0.83,
+      "learning_rate": 7.074839655384835e-07,
+      "loss": 1.1859,
+      "step": 728
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.979921036993042e-07,
+      "loss": 1.304,
+      "step": 729
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.885595657641214e-07,
+      "loss": 1.201,
+      "step": 730
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.791864818059179e-07,
+      "loss": 1.1713,
+      "step": 731
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.698729810778065e-07,
+      "loss": 1.2106,
+      "step": 732
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.606191920112664e-07,
+      "loss": 1.2678,
+      "step": 733
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.514252422143591e-07,
+      "loss": 1.3211,
+      "step": 734
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.422912584699753e-07,
+      "loss": 1.2193,
+      "step": 735
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.332173667340841e-07,
+      "loss": 1.204,
+      "step": 736
+    },
+    {
+      "epoch": 0.84,
+      "learning_rate": 6.242036921339973e-07,
+      "loss": 1.2639,
+      "step": 737
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 6.152503589666426e-07,
+      "loss": 1.2609,
+      "step": 738
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 6.063574906968511e-07,
+      "loss": 1.2043,
+      "step": 739
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.975252099556544e-07,
+      "loss": 1.2423,
+      "step": 740
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.887536385385917e-07,
+      "loss": 1.2067,
+      "step": 741
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.800428974040311e-07,
+      "loss": 1.2295,
+      "step": 742
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.713931066715078e-07,
+      "loss": 1.1883,
+      "step": 743
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.628043856200543e-07,
+      "loss": 1.196,
+      "step": 744
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.542768526865678e-07,
+      "loss": 1.2536,
+      "step": 745
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 5.458106254641715e-07,
+      "loss": 1.2407,
+      "step": 746
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 5.374058207005945e-07,
+      "loss": 1.2345,
+      "step": 747
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 5.290625542965611e-07,
+      "loss": 1.1855,
+      "step": 748
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 5.207809413041914e-07,
+      "loss": 1.2697,
+      "step": 749
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 5.125610959254213e-07,
+      "loss": 1.2528,
+      "step": 750
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 5.044031315104136e-07,
+      "loss": 1.2428,
+      "step": 751
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 4.963071605560144e-07,
+      "loss": 1.2263,
+      "step": 752
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 4.882732947041818e-07,
+      "loss": 1.2617,
+      "step": 753
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 4.803016447404629e-07,
+      "loss": 1.1969,
+      "step": 754
+    },
+    {
+      "epoch": 0.86,
+      "learning_rate": 4.723923205924558e-07,
+      "loss": 1.1646,
+      "step": 755
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.6454543132829653e-07,
+      "loss": 1.2267,
+      "step": 756
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.5676108515515684e-07,
+      "loss": 1.2072,
+      "step": 757
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.4903938941775084e-07,
+      "loss": 1.1877,
+      "step": 758
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.413804505968533e-07,
+      "loss": 1.2038,
+      "step": 759
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.3378437430783294e-07,
+      "loss": 1.2549,
+      "step": 760
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.262512652991968e-07,
+      "loss": 1.268,
+      "step": 761
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.187812274511427e-07,
+      "loss": 1.2731,
+      "step": 762
+    },
+    {
+      "epoch": 0.87,
+      "learning_rate": 4.113743637741296e-07,
+      "loss": 1.3252,
+      "step": 763
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 4.040307764074586e-07,
+      "loss": 1.1899,
+      "step": 764
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.9675056661785563e-07,
+      "loss": 1.2422,
+      "step": 765
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.895338347980898e-07,
+      "loss": 1.2417,
+      "step": 766
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.8238068046557276e-07,
+      "loss": 1.1714,
+      "step": 767
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.752912022610006e-07,
+      "loss": 1.243,
+      "step": 768
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.6826549794698074e-07,
+      "loss": 1.2715,
+      "step": 769
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.6130366440669693e-07,
+      "loss": 1.2184,
+      "step": 770
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.544057976425619e-07,
+      "loss": 1.1844,
+      "step": 771
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 3.4757199277490106e-07,
+      "loss": 1.1547,
+      "step": 772
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.408023440406355e-07,
+      "loss": 1.211,
+      "step": 773
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.340969447919873e-07,
+      "loss": 1.1662,
+      "step": 774
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.2745588749518775e-07,
+      "loss": 1.3596,
+      "step": 775
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.2087926372920577e-07,
+      "loss": 1.1916,
+      "step": 776
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.143671641844831e-07,
+      "loss": 1.2438,
+      "step": 777
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.0791967866168394e-07,
+      "loss": 1.2936,
+      "step": 778
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 3.015368960704584e-07,
+      "loss": 1.2453,
+      "step": 779
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 2.9521890442821276e-07,
+      "loss": 1.2201,
+      "step": 780
+    },
+    {
+      "epoch": 0.89,
+      "learning_rate": 2.889657908589e-07,
+      "loss": 1.2094,
+      "step": 781
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.8277764159181484e-07,
+      "loss": 1.1386,
+      "step": 782
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.7665454196040665e-07,
+      "loss": 1.1688,
+      "step": 783
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.7059657640110204e-07,
+      "loss": 1.2446,
+      "step": 784
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.6460382845214125e-07,
+      "loss": 1.2187,
+      "step": 785
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.5867638075242454e-07,
+      "loss": 1.2678,
+      "step": 786
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.5281431504037555e-07,
+      "loss": 1.2246,
+      "step": 787
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.470177121528089e-07,
+      "loss": 1.268,
+      "step": 788
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.4128665202382327e-07,
+      "loss": 1.1303,
+      "step": 789
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 2.356212136836894e-07,
+      "loss": 1.1405,
+      "step": 790
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 2.3002147525777118e-07,
+      "loss": 1.2348,
+      "step": 791
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 2.2448751396543788e-07,
+      "loss": 1.2552,
+      "step": 792
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 2.1901940611900707e-07,
+      "loss": 1.2936,
+      "step": 793
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 2.1361722712268772e-07,
+      "loss": 1.289,
+      "step": 794
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 2.0828105147154275e-07,
+      "loss": 1.2411,
+      "step": 795
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 2.0301095275046145e-07,
+      "loss": 1.2261,
+      "step": 796
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 1.9780700363314255e-07,
+      "loss": 1.1747,
+      "step": 797
+    },
+    {
+      "epoch": 0.91,
+      "learning_rate": 1.926692758810955e-07,
+      "loss": 1.2051,
+      "step": 798
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.8759784034264927e-07,
+      "loss": 1.2246,
+      "step": 799
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.825927669519728e-07,
+      "loss": 1.2018,
+      "step": 800
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.776541247281177e-07,
+      "loss": 1.1812,
+      "step": 801
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.7278198177405614e-07,
+      "loss": 1.2621,
+      "step": 802
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.679764052757532e-07,
+      "loss": 1.2021,
+      "step": 803
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.6323746150123e-07,
+      "loss": 1.2577,
+      "step": 804
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.5856521579965866e-07,
+      "loss": 1.2777,
+      "step": 805
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.5395973260045273e-07,
+      "loss": 1.274,
+      "step": 806
+    },
+    {
+      "epoch": 0.92,
+      "learning_rate": 1.4942107541238705e-07,
+      "loss": 1.2606,
+      "step": 807
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.449493068227159e-07,
+      "loss": 1.1909,
+      "step": 808
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.4054448849631087e-07,
+      "loss": 1.1955,
+      "step": 809
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.3620668117481471e-07,
+      "loss": 1.2731,
+      "step": 810
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.319359446757973e-07,
+      "loss": 1.3342,
+      "step": 811
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.2773233789193816e-07,
+      "loss": 1.1588,
+      "step": 812
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.2359591879020528e-07,
+      "loss": 1.2291,
+      "step": 813
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.1952674441106483e-07,
+      "loss": 1.1973,
+      "step": 814
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.1552487086768871e-07,
+      "loss": 1.2551,
+      "step": 815
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 1.1159035334518343e-07,
+      "loss": 1.2953,
+      "step": 816
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 1.0772324609982787e-07,
+      "loss": 1.2416,
+      "step": 817
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 1.03923602458324e-07,
+      "loss": 1.2722,
+      "step": 818
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 1.0019147481706626e-07,
+      "loss": 1.1502,
+      "step": 819
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 9.652691464141273e-08,
+      "loss": 1.2267,
+      "step": 820
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 9.292997246497959e-08,
+      "loss": 1.2338,
+      "step": 821
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 8.940069788894389e-08,
+      "loss": 1.3175,
+      "step": 822
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 8.593913958135691e-08,
+      "loss": 1.2424,
+      "step": 823
+    },
+    {
+      "epoch": 0.94,
+      "learning_rate": 8.254534527647851e-08,
+      "loss": 1.2945,
+      "step": 824
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 7.921936177411049e-08,
+      "loss": 1.1953,
+      "step": 825
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 7.59612349389599e-08,
+      "loss": 1.226,
+      "step": 826
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 7.277100970000062e-08,
+      "loss": 1.2159,
+      "step": 827
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 6.964873004985717e-08,
+      "loss": 1.1667,
+      "step": 828
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 6.659443904419638e-08,
+      "loss": 1.2288,
+      "step": 829
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 6.360817880113335e-08,
+      "loss": 1.2918,
+      "step": 830
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 6.06899905006525e-08,
+      "loss": 1.2764,
+      "step": 831
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 5.783991438403802e-08,
+      "loss": 1.3111,
+      "step": 832
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 5.505798975331933e-08,
+      "loss": 1.2598,
+      "step": 833
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 5.234425497072981e-08,
+      "loss": 1.2023,
+      "step": 834
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 4.9698747458176714e-08,
+      "loss": 1.2719,
+      "step": 835
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 4.712150369672652e-08,
+      "loss": 1.198,
+      "step": 836
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 4.461255922609986e-08,
+      "loss": 1.2503,
+      "step": 837
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 4.217194864418295e-08,
+      "loss": 1.2348,
+      "step": 838
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 3.979970560655133e-08,
+      "loss": 1.3072,
+      "step": 839
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 3.749586282600359e-08,
+      "loss": 1.1894,
+      "step": 840
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 3.526045207211059e-08,
+      "loss": 1.233,
+      "step": 841
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 3.309350417077972e-08,
+      "loss": 1.209,
+      "step": 842
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 3.0995049003826325e-08,
+      "loss": 1.1935,
+      "step": 843
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 2.8965115508564622e-08,
+      "loss": 1.3144,
+      "step": 844
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 2.700373167740744e-08,
+      "loss": 1.2205,
+      "step": 845
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 2.511092455747932e-08,
+      "loss": 1.1731,
+      "step": 846
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 2.3286720250246255e-08,
+      "loss": 1.087,
+      "step": 847
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 2.153114391115152e-08,
+      "loss": 1.2086,
+      "step": 848
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 1.984421974927375e-08,
+      "loss": 1.1868,
+      "step": 849
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 1.8225971026987755e-08,
+      "loss": 1.2852,
+      "step": 850
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 1.6676420059649756e-08,
+      "loss": 1.289,
+      "step": 851
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 1.5195588215283773e-08,
+      "loss": 1.2034,
+      "step": 852
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 1.3783495914291844e-08,
+      "loss": 1.1606,
+      "step": 853
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 1.244016262916814e-08,
+      "loss": 1.1817,
+      "step": 854
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 1.1165606884234182e-08,
+      "loss": 1.1963,
+      "step": 855
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 9.959846255381267e-09,
+      "loss": 1.23,
+      "step": 856
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 8.822897369827333e-09,
+      "loss": 1.1343,
+      "step": 857
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 7.754775905891576e-09,
+      "loss": 1.2412,
+      "step": 858
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 6.755496592773525e-09,
+      "loss": 1.3113,
+      "step": 859
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 5.825073210352084e-09,
+      "loss": 1.2007,
+      "step": 860
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 4.9635185889967966e-09,
+      "loss": 1.172,
+      "step": 861
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 4.170844609387992e-09,
+      "loss": 1.2584,
+      "step": 862
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 3.4470622023558e-09,
+      "loss": 1.2319,
+      "step": 863
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 2.792181348726941e-09,
+      "loss": 1.2648,
+      "step": 864
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 2.20621107918928e-09,
+      "loss": 1.2215,
+      "step": 865
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 1.6891594741663686e-09,
+      "loss": 1.3185,
+      "step": 866
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 1.2410336637047604e-09,
+      "loss": 1.2158,
+      "step": 867
+    },
+    {
+      "epoch": 0.99,
+      "learning_rate": 8.618398273779749e-10,
+      "loss": 1.2343,
+      "step": 868
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 5.515831941993455e-10,
+      "loss": 1.2476,
+      "step": 869
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 3.1026804255207544e-10,
+      "loss": 1.2134,
+      "step": 870
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 1.378977001276205e-10,
+      "loss": 1.22,
+      "step": 871
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 3.447454388127991e-11,
+      "loss": 1.2135,
+      "step": 872
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 0.0,
+      "loss": 1.1774,
+      "step": 873
+    },
+    {
+      "epoch": 1.0,
+      "step": 873,
+      "total_flos": 6.557120667907523e+18,
+      "train_loss": 1.3115756642777485,
+      "train_runtime": 2453.1326,
+      "train_samples_per_second": 91.074,
+      "train_steps_per_second": 0.356
+    }
+  ],
+  "logging_steps": 1.0,
+  "max_steps": 873,
+  "num_train_epochs": 1,
+  "save_steps": 5000,
+  "total_flos": 6.557120667907523e+18,
+  "trial_name": null,
+  "trial_params": null
+}
--- a/cache/models--HaoranWei--vary-llava80k/training_args.bin
+++ b/cache/models--HaoranWei--vary-llava80k/training_args.bin
--- a/cache/vary.png
+++ b/cache/vary.png
--- a/cache/vit-large-patch14/README.md
+++ b/cache/vit-large-patch14/README.md
+---
+tags:
+- vision
+widget:
+- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
+  candidate_labels: playing music, playing sports
+  example_title: Cat & Dog
+---
+
+# Model Card: CLIP
+
+Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found [here](https://github.com/openai/CLIP/blob/main/model-card.md).
+
+## Model Details
+
+The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general model deployment - to deploy models like CLIP, researchers will first need to carefully study their capabilities in relation to the specific context they’re being deployed within.
+
+### Model Date
+
+January 2021
+
+### Model Type
+
+The base model uses a ViT-L/14 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss.
+
+The original implementation had two variants: one using a ResNet image encoder and the other using a Vision Transformer. This repository has the variant with the Vision Transformer.
+
+
+### Documents
+
+- [Blog Post](https://openai.com/blog/clip/)
+- [CLIP Paper](https://arxiv.org/abs/2103.00020)
+
+
+### Use with Transformers
+
+```python
+from PIL import Image
+import requests
+
+from transformers import CLIPProcessor, CLIPModel
+
+model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
+processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
+
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+
+inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
+
+outputs = model(**inputs)
+logits_per_image = outputs.logits_per_image # this is the image-text similarity score
+probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
+```
+
+
+## Model Use
+
+### Intended Use
+
+The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
+
+#### Primary intended uses
+
+The primary intended users of these models are AI researchers.
+
+We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models.
+
+### Out-of-Scope Use Cases
+
+**Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. 
+
+Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use.
+
+Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases.
+
+
+
+## Data
+
+The model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet which tend to skew towards more developed nations, and younger, male users.
+
+### Data Mission Statement
+
+Our goal with building this dataset was to test out robustness and generalizability in computer vision tasks. As a result, the focus was on gathering large quantities of data from different publicly-available internet data sources. The data was gathered in a mostly non-interventionist manner. However, we only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content. We do not intend for this dataset to be used as the basis for any commercial or deployed model and will not be releasing the dataset.
+
+
+
+## Performance and Limitations
+
+### Performance
+
+We have evaluated the performance of CLIP on a wide range of benchmarks across a variety of computer vision datasets such as OCR to texture recognition to fine-grained classification. The paper describes model performance on the following datasets:
+
+- Food101
+- CIFAR10   
+- CIFAR100   
+- Birdsnap
+- SUN397
+- Stanford Cars
+- FGVC Aircraft
+- VOC2007
+- DTD
+- Oxford-IIIT Pet dataset
+- Caltech101
+- Flowers102
+- MNIST   
+- SVHN 
+- IIIT5K   
+- Hateful Memes   
+- SST-2
+- UCF101
+- Kinetics700
+- Country211
+- CLEVR Counting
+- KITTI Distance
+- STL-10
+- RareAct
+- Flickr30
+- MSCOCO
+- ImageNet
+- ImageNet-A
+- ImageNet-R
+- ImageNet Sketch
+- ObjectNet (ImageNet Overlap)
+- Youtube-BB
+- ImageNet-Vid
+
+## Limitations
+
+CLIP and our analysis of it have a number of limitations. CLIP currently struggles with respect to certain tasks such as fine grained classification and counting objects. CLIP also poses issues with regards to fairness and bias which we discuss in the paper and briefly in the next section. Additionally, our approach to testing CLIP also has an important limitation- in many cases we have used linear probes to evaluate the performance of CLIP and there is evidence suggesting that linear probes can underestimate model performance.
+
+### Bias and Fairness
+
+We find that the performance of CLIP - and the specific biases it exhibits - can depend significantly on class design and the choices one makes for categories to include and exclude. We tested the risk of certain kinds of denigration with CLIP by classifying images of people from [Fairface](https://arxiv.org/abs/1908.04913) into crime-related and non-human animal categories. We found significant disparities with respect to race and gender. Additionally, we found that these disparities could shift based on how the classes were constructed. (Details captured in the Broader Impacts Section in the paper).
+
+We also tested the performance of CLIP on gender, race and age classification using the Fairface dataset (We default to using race categories as they are constructed in the Fairface dataset.) in order to assess quality of performance across different demographics. We found accuracy >96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification. Our use of evaluations to test for gender, race and age classification as well as denigration harms is simply to evaluate performance of the model across people and surface potential risks and not to demonstrate an endorsement/enthusiasm for such tasks.
+
+
+
+## Feedback
+
+### Where to send questions or comments about the model
+
+Please use [this Google Form](https://forms.gle/Uv7afRH5dvY34ZEs9)
\ No newline at end of file
--- a/cache/vit-large-patch14/config.json
+++ b/cache/vit-large-patch14/config.json
+{
+  "_name_or_path": "/Vary/cache/vit-large-patch14/",
+  "architectures": [
+    "CLIPModel"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 768,
+  "text_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 77,
+    "min_length": 0,
+    "model_type": "clip_text_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "projection_dim" : 768,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.16.0.dev0",
+    "use_bfloat16": false,
+    "vocab_size": 49408
+  },
+  "text_config_dict": {
+    "hidden_size": 768,
+    "intermediate_size": 3072,
+    "num_attention_heads": 12,
+    "num_hidden_layers": 12,
+    "projection_dim": 768
+  },
+  "torch_dtype": "float32",
+  "transformers_version": null,
+  "vision_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 1024,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 224,
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 4096,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "clip_vision_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 16,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 24,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 14,
+    "prefix": null,
+    "problem_type": null,
+    "projection_dim" : 768,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.16.0.dev0",
+    "use_bfloat16": false
+  },
+  "vision_config_dict": {
+    "hidden_size": 1024,
+    "intermediate_size": 4096,
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24,
+    "patch_size": 14,
+    "projection_dim": 768
+  }
+}