README.md 1.48 KB
Newer Older
Geewook Kim's avatar
Geewook Kim committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# SynthDoG 🐶: Synthetic Document Generator

SynthDoG is synthetic document generator for visual document understanding (VDU).

![image](../misc/sample_synthdog.png)

## Prerequisites

- python>=3.6
- [synthtiger](https://github.com/clovaai/synthtiger) (`pip install synthtiger`)

## Usage

```bash
# Set environment variable (for macOS)
$ export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Geewook Kim's avatar
Geewook Kim committed
18
synthtiger -o ./outputs/SynthDoG_en -c 50 -w 4 -v template.py SynthDoG config_en.yaml
Geewook Kim's avatar
Geewook Kim committed
19
20

{'config': 'config_en.yaml',
Geewook Kim's avatar
Geewook Kim committed
21
22
23
 'count': 50,
 'name': 'SynthDoG',
 'output': './outputs/SynthDoG_en',
Geewook Kim's avatar
Geewook Kim committed
24
25
26
27
28
29
30
31
32
33
34
35
36
 'script': 'template.py',
 'verbose': True,
 'worker': 4}
{'aspect_ratio': [1, 2],
     .
     .
 'quality': [50, 95],
 'short_size': [720, 1024]}
Generated 1 data
Generated 2 data
Generated 3 data
     .
     .
Geewook Kim's avatar
Geewook Kim committed
37
38
39
Generated 49 data
Generated 50 data
46.32 seconds elapsed
Geewook Kim's avatar
Geewook Kim committed
40
41
```

Geewook Kim's avatar
Geewook Kim committed
42
43
44
45
46
47
48
Some important arguments:

- `-o` : directory path to save data.
- `-c` : number of data to generate.
- `-w` : number of workers.
- `-v` : print error messages.

Geewook Kim's avatar
Geewook Kim committed
49
50
51
To generate ECJK samples:
```bash
# english
Geewook Kim's avatar
Geewook Kim committed
52
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_en.yaml
Geewook Kim's avatar
Geewook Kim committed
53
54

# chinese
Geewook Kim's avatar
Geewook Kim committed
55
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_zh.yaml
Geewook Kim's avatar
Geewook Kim committed
56
57

# japanese
Geewook Kim's avatar
Geewook Kim committed
58
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_ja.yaml
Geewook Kim's avatar
Geewook Kim committed
59
60

# korean
Geewook Kim's avatar
Geewook Kim committed
61
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_ko.yaml
Geewook Kim's avatar
Geewook Kim committed
62
```