• alexeib's avatar
    Conv lm implementation · 4c2ef2de
    alexeib authored
    This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf
    
    There are 3 modes for constructing batches:
    
    - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
    - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
    - eos: one sentence per sample (skip blank lines)
    
    some results:
    
    GCNN-13 - GBW - 37.46
    GCNN-14B - GBW - 33.88
    GCNN-8 - Wiki103 - 43.76
    GCNN-14 - Wiki103 - 35.66
    
    train:
    
    python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(...
    4c2ef2de
adaptive_loss.py 2.87 KB