"git@developer.sourcefind.cn:OpenDAS/fairscale.git" did not exist on "d3417cebeabc9b1982c003bd570615335aa3d8d2"
Commit 45f23f66 authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Add more details for bulk BPE encoding

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/793

Differential Revision: D16603930

Pulled By: myleott

fbshipit-source-id: b302db3743db4f36c14fb0dc7f3456fe8a0079dd
parent 430905d7
...@@ -17,8 +17,11 @@ from fairseq.data.encoders.gpt2_bpe import get_encoder ...@@ -17,8 +17,11 @@ from fairseq.data.encoders.gpt2_bpe import get_encoder
def main(): def main():
""" """
Helper script to encode raw text Helper script to encode raw text with the GPT-2 BPE using multiple processes.
with the GPT-2 BPE using multiple processes.
The encoder.json and vocab.bpe files can be obtained here:
- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
""" """
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument( parser.add_argument(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment