Commit 01a42322 authored by wxj's avatar wxj
Browse files

Upload New File

parent cdf234ac
Pipeline #2546 failed with stages
in 0 seconds
python tools/preprocess_data.py \
--input /public/home/wangxj/Downloads/datasets/oscar-1GB-head/oscar-1GB_head.jsonl \
--output-prefix /public/home/wangxj/Downloads/datasets/oscar-1GB-head/oscar-1GB_head-qwen \
--vocab-file /public/home/wangxj/Downloads/model_weights/qwen1.5_14b/vocab.json \
--tokenizer-type QwenTokenizer \
--merge-file /public/home/wangxj/Downloads/model_weights/qwen1.5_14b/merges.txt \
--append-eod \
--workers 8
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment