1. 18 Mar, 2020 2 commits
    • Hongkun Yu's avatar
      Internal change · b86ffb12
      Hongkun Yu authored
      PiperOrigin-RevId: 301639338
      b86ffb12
    • Ruoxin Sang's avatar
      Some improvements and bug fixes to Controller: · febaae9a
      Ruoxin Sang authored
      1. Fix a bug that checkpoint will be saved after every training loop.
      2. Only create the training and eval summaries writers if the corresponding `train_fn` and `eval_fn` are passed.
      3. Flush the summary writers after training and eval finish.
      4. Add a Controller test.
      
      Also make sure there is no evaluation happening in Resnet CTL example if `skip_eval=True`.
      
      PiperOrigin-RevId: 301489305
      febaae9a
  2. 17 Mar, 2020 6 commits
  3. 16 Mar, 2020 3 commits
  4. 15 Mar, 2020 3 commits
  5. 14 Mar, 2020 5 commits
  6. 13 Mar, 2020 5 commits
  7. 12 Mar, 2020 7 commits
  8. 11 Mar, 2020 7 commits
  9. 10 Mar, 2020 2 commits
    • A. Unique TensorFlower's avatar
      Internal change · 08f45dc4
      A. Unique TensorFlower authored
      PiperOrigin-RevId: 300203487
      08f45dc4
    • Ran Chen's avatar
      Save to tmp directory on non-chief workers in model_training_utils · 682d36b5
      Ran Chen authored
      In a multi worker set up saving is done on each worker. If they're saving to the same location, e.g. GCS, there will be conflicts. With this change we save to temporary directory on non-chief workers.
      
      Note that, there may be synchronization in saving that needs all workers to participate, so we cannot only save on one worker.
      
      PiperOrigin-RevId: 300141152
      682d36b5