README.md 1.55 KB
Newer Older
Geewook Kim's avatar
Geewook Kim committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# SynthDoG 🐶: Synthetic Document Generator

SynthDoG is synthetic document generator for visual document understanding (VDU).

![image](../misc/sample_synthdog.png)

## Prerequisites

- python>=3.6
- [synthtiger](https://github.com/clovaai/synthtiger) (`pip install synthtiger`)

## Usage

```bash
# Set environment variable (for macOS)
$ export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Geewook Kim's avatar
Geewook Kim committed
18
synthtiger -o ./outputs/SynthDoG_en -c 50 -w 4 -v template.py SynthDoG config_en.yaml
Geewook Kim's avatar
Geewook Kim committed
19
20

{'config': 'config_en.yaml',
Geewook Kim's avatar
Geewook Kim committed
21
22
23
 'count': 50,
 'name': 'SynthDoG',
 'output': './outputs/SynthDoG_en',
Geewook Kim's avatar
Geewook Kim committed
24
25
26
27
28
29
30
31
 'script': 'template.py',
 'verbose': True,
 'worker': 4}
{'aspect_ratio': [1, 2],
     .
     .
 'quality': [50, 95],
 'short_size': [720, 1024]}
moonbings's avatar
moonbings committed
32
33
34
Generated 1 data (task 3)
Generated 2 data (task 0)
Generated 3 data (task 1)
Geewook Kim's avatar
Geewook Kim committed
35
36
     .
     .
moonbings's avatar
moonbings committed
37
38
Generated 49 data (task 48)
Generated 50 data (task 49)
Geewook Kim's avatar
Geewook Kim committed
39
46.32 seconds elapsed
Geewook Kim's avatar
Geewook Kim committed
40
41
```

Geewook Kim's avatar
Geewook Kim committed
42
43
44
45
46
Some important arguments:

- `-o` : directory path to save data.
- `-c` : number of data to generate.
- `-w` : number of workers.
moonbings's avatar
moonbings committed
47
- `-s` : random seed.
Geewook Kim's avatar
Geewook Kim committed
48
49
- `-v` : print error messages.

Geewook Kim's avatar
Geewook Kim committed
50
51
52
To generate ECJK samples:
```bash
# english
Geewook Kim's avatar
Geewook Kim committed
53
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_en.yaml
Geewook Kim's avatar
Geewook Kim committed
54
55

# chinese
Geewook Kim's avatar
Geewook Kim committed
56
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_zh.yaml
Geewook Kim's avatar
Geewook Kim committed
57
58

# japanese
Geewook Kim's avatar
Geewook Kim committed
59
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_ja.yaml
Geewook Kim's avatar
Geewook Kim committed
60
61

# korean
Geewook Kim's avatar
Geewook Kim committed
62
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_ko.yaml
Geewook Kim's avatar
Geewook Kim committed
63
```