Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
a9738f63
Commit
a9738f63
authored
Dec 05, 2021
by
zihanl
Browse files
update running scripts
parent
3a2d1e30
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
15 additions
and
9 deletions
+15
-9
tasks/knwl_dialo/scripts/data_processing.sh
tasks/knwl_dialo/scripts/data_processing.sh
+12
-6
tasks/knwl_dialo/scripts/eval_generation.sh
tasks/knwl_dialo/scripts/eval_generation.sh
+2
-2
tasks/knwl_dialo/scripts/prep_resp_gen.sh
tasks/knwl_dialo/scripts/prep_resp_gen.sh
+1
-1
No files found.
tasks/knwl_dialo/scripts/data_processing.sh
View file @
a9738f63
...
...
@@ -6,16 +6,22 @@
# WoI: https://parl.ai/projects/sea/
DIR
=
`
pwd
`
mkdir
${
DIR
}
/tasks/knwl_dialo/data
mkdir
${
DIR
}
/tasks/knwl_dialo/data/wizard_of_wikipedia
mkdir
${
DIR
}
/tasks/knwl_dialo/data/wizard_of_internet
# Before running the preprocessing, please download the datasets and put them into the corresponding created data folder.
# We provide the following script to process the raw data from Wizard of Wikipedia
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
process_wow_dataset
--raw_file
<PATH_OF_THE_INPUT_DATA>
--processed_file
<PATH_OF_THE_OUTPUT_DATA>
--knwl_ref_file
<PATH_OF_THE_KNOWLEDGE_REFERENCE_OUTPUT_DATA>
--resp_ref_file
<PATH_OF_THE_RESPONSE_REFERENCE_OUTPUT_DATA>
# We provide examples for processing the raw data from Wizard of Wikipedia
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
process_wow_dataset
--raw_file
${
DIR
}
/tasks/knwl_dialo/data/wizard_of_wikipedia/train.json
--processed_file
<PATH_OF_THE_PROCESSED_WOW_TRAIN_DATA>
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
process_wow_dataset
--raw_file
${
DIR
}
/tasks/knwl_dialo/data/wizard_of_wikipedia/test_random_split.json
--processed_file
<PATH_OF_THE_PROCESSED_TEST_SEEN_DATA>
--knwl_ref_file
<PATH_OF_THE_TEST_SEEN_KNOWLEDGE_REFERENCE_OUTPUT_DATA>
--resp_ref_file
<PATH_OF_THE_TEST_SEEN_RESPONSE_REFERENCE_OUTPUT_DATA>
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
process_wow_dataset
--raw_file
${
DIR
}
/tasks/knwl_dialo/data/wizard_of_wikipedia/test_topic_split.json
--processed_file
<PATH_OF_THE_PROCESSED_TEST_UNSEEN_DATA>
--knwl_ref_file
<PATH_OF_THE_TEST_UNSEEN_KNOWLEDGE_REFERENCE_OUTPUT_DATA>
--resp_ref_file
<PATH_OF_THE_TEST_UNSEEN_RESPONSE_REFERENCE_OUTPUT_DATA>
# We provide the following script to process the raw data from Wizard of Internet
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
process_woi_dataset
--raw_file
<PATH_OF_THE_INPUT_DATA>
--processed_file
<PATH_OF_THE_
OUTPU
T_DATA>
--knwl_ref_file
<PATH_OF_THE_KNOWLEDGE_REFERENCE_OUTPUT_DATA>
--resp_ref_file
<PATH_OF_THE_RESPONSE_REFERENCE_OUTPUT_DATA>
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
process_woi_dataset
--raw_file
${
DIR
}
/tasks/knwl_dialo/data/wizard_of_internet/test.jsonl
--processed_file
<PATH_OF_THE_
PROCESSED_TES
T_DATA>
--knwl_ref_file
<PATH_OF_THE_
TEST_
KNOWLEDGE_REFERENCE_OUTPUT_DATA>
--resp_ref_file
<PATH_OF_THE_
TEST_
RESPONSE_REFERENCE_OUTPUT_DATA>
# Obtain the knowledge generation prompts
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
get_knwl_gen_prompts
--test_file
<PATH_OF_THE_PROCESSED_TEST_DATA>
--train_file
<PATH_OF_THE_PROCESSED_TRAIN_DATA>
--model_file
<PATH_OF_THE_DPR_MODEL>
--processed_file
<PATH_OF_THE_OUTPUT_FILE>
--data_type
<DATA_TYPE_OF_THE_INPUT_FILE>
# Obtain the knowledge generation prompts
for each test dataset (Wizard of Wikipedia test seen/unseen and Wizard of Internet test)
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
get_knwl_gen_prompts
--test_file
<PATH_OF_THE_PROCESSED_TEST_DATA>
--train_file
<PATH_OF_THE_PROCESSED_
WOW_
TRAIN_DATA>
--model_file
<PATH_OF_THE_DPR_MODEL>
--processed_file
<PATH_OF_THE_OUTPUT_
PROMPT_
FILE>
--data_type
<DATA_TYPE_OF_THE_INPUT_FILE>
# Obtain the response generation prompts
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
get_resp_gen_prompts
--train_file
<PATH_OF_THE_PROCESSED_TRAIN_DATA>
--processed_file
<PATH_OF_THE_OUTPUT_FILE>
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
get_resp_gen_prompts
--train_file
<PATH_OF_THE_PROCESSED_
WOW_
TRAIN_DATA>
--processed_file
<PATH_OF_THE_OUTPUT_
PROMPT_
FILE>
tasks/knwl_dialo/scripts/eval_generation.sh
View file @
a9738f63
...
...
@@ -10,8 +10,8 @@ DISTRIBUTED_ARGS="--nproc_per_node $WORLD_SIZE \
--master_addr localhost
\
--master_port 6000"
OUTPUT_PATH
=
<S
peicifc path
for
the output generation
>
GROUND_TRUTH_PATH
=
<S
peicifc path
for
the ground truth
>
OUTPUT_PATH
=
<S
PECIFIC_PATH_FOR_THE_OUTPUT_GENERATION
>
GROUND_TRUTH_PATH
=
<S
PECIFIC_PATH_FOR_THE_GROUND_TRUTH
>
python
-m
torch.distributed.launch
$DISTRIBUTED_ARGS
./tasks/main.py
\
--num-layers
24
\
...
...
tasks/knwl_dialo/scripts/prep_resp_gen.sh
View file @
a9738f63
...
...
@@ -3,4 +3,4 @@
# Preparing the input file for the response generation (second-stage prompting)
DIR
=
`
pwd
`
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
prepare_input
--test_file
<PATH_OF_THE_PROCESSED_TEST_DATA>
--knowledge_gen_file
<PATH_OF_THE_GENERATED_KNOWLEDGE_DATA>
--processed_file
<PATH_OF_THE_
OUT
PUT_FILE>
python
${
DIR
}
/tasks/knwl_dialo/preprocessing.py
--func
prepare_input
--test_file
<PATH_OF_THE_PROCESSED_TEST_DATA>
--knowledge_gen_file
<PATH_OF_THE_GENERATED_KNOWLEDGE_DATA>
--processed_file
<PATH_OF_THE_
IN
PUT_FILE
_FOR_RESPONSE_GENERATION
>
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment