- 12 Jul, 2022 1 commit
-
-
Nikhila Ravi authored
Summary: ## Changes: - Added Accelerate Library and refactored experiment.py to use it - Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra - Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers - Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model ## Tests Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands: **Torchrun** Modules loaded: ```sh 1) anaconda3/2021.05 2) cuda/11.3 3) NCCL/2.9.8-3-cuda.11.3 4) gcc/5.2.0. (but unload gcc when using submit) ``` ```sh torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test ``` **Submitit/Hydra Local test** ```sh ~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1 ``` **Submitit/Hydra distributed test** ```sh ~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320 ``` ## TODOS: - Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data` contains `Cameras` type. - Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method? - Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra X-link: https://github.com/fairinternal/pytorch3d/pull/37 Reviewed By: davnov134, kjchalup Differential Revision: D37543870 Pulled By: bottler fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76
-
- 06 Jul, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: Add facilities for dataloading non-sequential scenes. Reviewed By: shapovalov Differential Revision: D37291277 fbshipit-source-id: 0a33e3727b44c4f0cba3a2abe9b12f40d2a20447
-
- 10 Jun, 2022 3 commits
-
-
Jeremy Reizenstein authored
Summary: Add test that the yaml files deserialize. Reviewed By: davnov134 Differential Revision: D36830673 fbshipit-source-id: b785d8db97b676686036760bfa2dd3fa638bda57
-
Jeremy Reizenstein authored
Summary: Make dataset type and args configurable on JsonIndexDatasetMapProvider. Reviewed By: davnov134 Differential Revision: D36666705 fbshipit-source-id: 4d0a3781d9a956504f51f1c7134c04edf1eb2846
-
Jeremy Reizenstein authored
Summary: Allow access to manifold internally by default. Reviewed By: davnov134 Differential Revision: D36760481 fbshipit-source-id: 2a16bd40e81ef526085ac1b3f4606b63c1841428
-
- 26 May, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: Add simple interactive testrunner for experiment.py Reviewed By: shapovalov Differential Revision: D35316221 fbshipit-source-id: d424bcba632eef89eefb56e18e536edb58ec6f85
-