You need to sign in or sign up before continuing.
Commit 45f23f66 authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Add more details for bulk BPE encoding

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/793

Differential Revision: D16603930

Pulled By: myleott

fbshipit-source-id: b302db3743db4f36c14fb0dc7f3456fe8a0079dd
parent 430905d7
......@@ -17,8 +17,11 @@ from fairseq.data.encoders.gpt2_bpe import get_encoder
def main():
"""
Helper script to encode raw text
with the GPT-2 BPE using multiple processes.
Helper script to encode raw text with the GPT-2 BPE using multiple processes.
The encoder.json and vocab.bpe files can be obtained here:
- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
"""
parser = argparse.ArgumentParser()
parser.add_argument(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment