Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
01a42322
Commit
01a42322
authored
Mar 17, 2025
by
wxj
Browse files
Upload New File
parent
cdf234ac
Pipeline
#2546
failed with stages
in 0 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
0 deletions
+8
-0
qwen_data_process.sh
qwen_data_process.sh
+8
-0
No files found.
qwen_data_process.sh
0 → 100644
View file @
01a42322
python tools/preprocess_data.py
\
--input
/public/home/wangxj/Downloads/datasets/oscar-1GB-head/oscar-1GB_head.jsonl
\
--output-prefix
/public/home/wangxj/Downloads/datasets/oscar-1GB-head/oscar-1GB_head-qwen
\
--vocab-file
/public/home/wangxj/Downloads/model_weights/qwen1.5_14b/vocab.json
\
--tokenizer-type
QwenTokenizer
\
--merge-file
/public/home/wangxj/Downloads/model_weights/qwen1.5_14b/merges.txt
\
--append-eod
\
--workers
8
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment