* add hf tokenizer * format * fix for comment * don't skip speical tokens
* check-in dockerfile * check-in dockerfile