Commit 45f23f66 authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Add more details for bulk BPE encoding

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/793

Differential Revision: D16603930

Pulled By: myleott

fbshipit-source-id: b302db3743db4f36c14fb0dc7f3456fe8a0079dd
parent 430905d7
......@@ -17,8 +17,11 @@ from fairseq.data.encoders.gpt2_bpe import get_encoder
def main():
"""
Helper script to encode raw text
with the GPT-2 BPE using multiple processes.
Helper script to encode raw text with the GPT-2 BPE using multiple processes.
The encoder.json and vocab.bpe files can be obtained here:
- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
"""
parser = argparse.ArgumentParser()
parser.add_argument(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment