TextMonkey was trained using 8 A800 GPUs on a dataset of just 40k image-text pairs, requiring approximately 1 day and 6 hours of training time. It is capable of running inference on a 3090 GPU.
TextMonkey was trained using 8 A800 GPUs on a dataset of 400k data, requiring approximately 1 day and 6 hours of training time. It is capable of running inference on a 3090 GPU.