Commit d3f759b3 authored by Geewook Kim's avatar Geewook Kim
Browse files

docs: enhance comments

parent 0353dbf8
...@@ -41,8 +41,8 @@ Gradio web demos are available! [![Demo](https://img.shields.io/badge/Demo-Gradi ...@@ -41,8 +41,8 @@ Gradio web demos are available! [![Demo](https://img.shields.io/badge/Demo-Gradi
| [DocVQA Task1](https://rrc.cvc.uab.es/?ch=17) (Document VQA) | 0.78 | 67.5 | [donut-base-finetuned-docvqa](https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/tree/official) | [gradio space web demo](https://huggingface.co/spaces/nielsr/donut-docvqa),<br>[google colab demo](https://colab.research.google.com/drive/1Z4WG8Wunj3HE0CERjt608ALSgSzRC9ig?usp=sharing) | | [DocVQA Task1](https://rrc.cvc.uab.es/?ch=17) (Document VQA) | 0.78 | 67.5 | [donut-base-finetuned-docvqa](https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/tree/official) | [gradio space web demo](https://huggingface.co/spaces/nielsr/donut-docvqa),<br>[google colab demo](https://colab.research.google.com/drive/1Z4WG8Wunj3HE0CERjt608ALSgSzRC9ig?usp=sharing) |
The links to the pre-trained backbones are here: The links to the pre-trained backbones are here:
- [`donut-base`](https://huggingface.co/naver-clova-ix/donut-base/tree/official): trained with 64 A100 GPUs (~2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (ECJK, 0.5M x 4). - [`donut-base`](https://huggingface.co/naver-clova-ix/donut-base/tree/official): trained with 64 A100 GPUs (~2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (English, Chinese, Japanese, Korean, 0.5M x 4).
- [`donut-proto`](https://huggingface.co/naver-clova-ix/donut-proto/tree/official): (preliminary model) trained with 8 V100 GPUs (~5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (EJK, 0.4M x 3). - [`donut-proto`](https://huggingface.co/naver-clova-ix/donut-proto/tree/official): (preliminary model) trained with 8 V100 GPUs (~5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (English, Japanese, Korean, 0.4M x 3).
Please see [our paper](#how-to-cite) for more details. Please see [our paper](#how-to-cite) for more details.
......
...@@ -8,7 +8,8 @@ val_batch_sizes: [4] ...@@ -8,7 +8,8 @@ val_batch_sizes: [4]
input_size: [2560, 1920] input_size: [2560, 1920]
max_length: 128 max_length: 128
align_long_axis: False align_long_axis: False
num_nodes: 8 # num_nodes: 8 # memo: donut-base-finetuned-docvqa was trained with 8 nodes
num_nodes: 1
seed: 2022 seed: 2022
lr: 3e-5 lr: 3e-5
warmup_steps: 10000 warmup_steps: 10000
......
...@@ -8,7 +8,8 @@ val_batch_sizes: [4] ...@@ -8,7 +8,8 @@ val_batch_sizes: [4]
input_size: [2560, 1920] input_size: [2560, 1920]
max_length: 8 max_length: 8
align_long_axis: False align_long_axis: False
num_nodes: 8 # num_nodes: 8 # memo: donut-base-finetuned-rvlcdip was trained with 8 nodes
num_nodes: 1
seed: 2022 seed: 2022
lr: 2e-5 lr: 2e-5
warmup_steps: 10000 warmup_steps: 10000
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment