datasets nltk numpy parameterized pybind11 regex six sentencepiece tensorboard transformers ninja mpi4py einops