"llama/llama.cpp/tools/vscode:/vscode.git/clone" did not exist on "1c094038bcfe0ca40c90273cb7228f8ad34b7417"
  1. 25 Sep, 2018 5 commits
  2. 03 Sep, 2018 9 commits
  3. 25 Jul, 2018 1 commit
  4. 25 Jun, 2018 2 commits
  5. 24 Jun, 2018 1 commit
  6. 21 Jun, 2018 2 commits
  7. 15 Jun, 2018 11 commits
    • Myle Ott's avatar
      Fix bidirectional lstm · bfcc6ec7
      Myle Ott authored
      bfcc6ec7
    • Myle Ott's avatar
      Updates for latest PyTorch · e89329d6
      Myle Ott authored
      e89329d6
    • Myle Ott's avatar
      Add FairseqTask · ff68a9ef
      Myle Ott authored
      A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.
      
      Changes:
      - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
      - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
      - Remove LEFT_PAD_* constants and make them configurable per task
      ff68a9ef
    • Myle Ott's avatar
      16a72b4d
    • Myle Ott's avatar
      Suppress stdout in test_train · 736fbee2
      Myle Ott authored
      736fbee2
    • Myle Ott's avatar
      Nits · cf1c64a5
      Myle Ott authored
      cf1c64a5
    • alexeib's avatar
      record end_of_epoch in checkpoint · 7d560402
      alexeib authored
      7d560402
    • alexeib's avatar
    • alexeib's avatar
      Conv lm implementation · 4c2ef2de
      alexeib authored
      This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf
      
      There are 3 modes for constructing batches:
      
      - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
      - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
      - eos: one sentence per sample (skip blank lines)
      
      some results:
      
      GCNN-13 - GBW - 37.46
      GCNN-14B - GBW - 33.88
      GCNN-8 - Wiki103 - 43.76
      GCNN-14 - Wiki103 - 35.66
      
      train:
      
      python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(...
      4c2ef2de
    • Myle Ott's avatar
      Fix tests · ae2585d9
      Myle Ott authored
      ae2585d9
    • Myle Ott's avatar
      Fix tests · 8afb7761
      Myle Ott authored
      8afb7761
  8. 24 May, 2018 1 commit
  9. 02 Apr, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#136) · d3795d6c
      Myle Ott authored
      Changes:
      - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
      - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
      - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
      - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
      d3795d6c
  10. 05 Mar, 2018 1 commit
  11. 01 Mar, 2018 1 commit
  12. 27 Feb, 2018 4 commits
  13. 08 Nov, 2017 1 commit