> 1. A tokenizer is required for data processing. The system will use the tokenizer from model_args, or fall back to the tokenizer associated with the pretrained model name.
> 2. The default maximum sequence length is 4096. For calculating metrics of different max seq lengths, specify additional lengths using the metadata parameter:
> `--metadata=max_seq_length=4096,8192,16384,32768,65536,131072`. The metadata parameter can also be passed to the TaskManager (metadata: dict[str, tuple]).
> `--metadata=max_seq_lengths=4096,8192,16384,32768,65536,131072`. The metadata parameter can also be passed to the TaskManager (metadata: dict[str, tuple]).
> 3. To prevent truncation of longer sequences, we recommend setting the max_length parameter in model_args: