- 12 Jul, 2022 1 commit
-
-
Nikhila Ravi authored
Summary: ## Changes: - Added Accelerate Library and refactored experiment.py to use it - Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra - Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers - Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model ## Tests Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands: **Torchrun** Modules loaded: ```sh 1) anaconda3/2021.05 2) cuda/11.3 3) NCCL/2.9.8-3-cuda.11.3 4) gcc/5.2.0. (but unload gcc when using submit) ``` ```sh torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test ``` **Submitit/Hydra Local test** ```sh ~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1 ``` **Submitit/Hydra distributed test** ```sh ~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320 ``` ## TODOS: - Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data` contains `Cameras` type. - Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method? - Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra X-link: https://github.com/fairinternal/pytorch3d/pull/37 Reviewed By: davnov134, kjchalup Differential Revision: D37543870 Pulled By: bottler fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76
-
- 06 Jul, 2022 3 commits
-
-
Jeremy Reizenstein authored
Summary: As part of removing Task, move camera difficulty bin breaks from hard code to the top level. Reviewed By: davnov134 Differential Revision: D37491040 fbshipit-source-id: f2d6775ebc490f6f75020d13f37f6b588cc07a0b
-
Jeremy Reizenstein authored
Summary: Enable pyre checking of the trainer code. Reviewed By: shapovalov Differential Revision: D36545438 fbshipit-source-id: db1ea8d1ade2da79a2956964eb0c7ba302fa40d1
-
Jeremy Reizenstein authored
Summary: As part of removing Task, make the dataset code generate the source cameras for itself. There's a small optimization available here, in that the JsonIndexDataset could avoid loading images. Reviewed By: shapovalov Differential Revision: D37313423 fbshipit-source-id: 3e5e0b2aabbf9cc51f10547a3523e98c72ad8755
-
- 16 Jun, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: Copy code from NeRF for loading LLFF data and blender synthetic data, and create dataset objects for them Reviewed By: shapovalov Differential Revision: D35581039 fbshipit-source-id: af7a6f3e9a42499700693381b5b147c991f57e5d
-
- 10 Jun, 2022 2 commits
-
-
Jeremy Reizenstein authored
Summary: Add test that the yaml files deserialize. Reviewed By: davnov134 Differential Revision: D36830673 fbshipit-source-id: b785d8db97b676686036760bfa2dd3fa638bda57
-
Jeremy Reizenstein authored
Summary: Preparing for pluggables in experiment.py Reviewed By: davnov134 Differential Revision: D36830674 fbshipit-source-id: eab499d1bc19c690798fbf7da547544df7e88fa5
-
- 26 May, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: Add simple interactive testrunner for experiment.py Reviewed By: shapovalov Differential Revision: D35316221 fbshipit-source-id: d424bcba632eef89eefb56e18e536edb58ec6f85
-
- 25 May, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: The ImplicitronDataset class corresponds to JsonIndexDatasetMapProvider Reviewed By: shapovalov Differential Revision: D36661396 fbshipit-source-id: 80ca2ff81ef9ecc2e3d1f4e1cd14b6f66a7ec34d
-
- 20 May, 2022 4 commits
-
-
Jeremy Reizenstein authored
Summary: replace dataloader_zoo with a pluggable DataLoaderMapProvider. Reviewed By: shapovalov Differential Revision: D36475441 fbshipit-source-id: d16abb190d876940434329928f2e3f2794a25416
-
Jeremy Reizenstein authored
Summary: replace dataset_zoo with a pluggable DatasetMapProvider. The logic is now in annotated_file_dataset_map_provider. Reviewed By: shapovalov Differential Revision: D36443965 fbshipit-source-id: 9087649802810055e150b2fbfcc3c197a761f28a
-
Jeremy Reizenstein authored
Summary: Separate ImplicitronDatasetBase and FrameData (to be used by all data sources) from ImplicitronDataset (which is specific). Reviewed By: shapovalov Differential Revision: D36413111 fbshipit-source-id: 3725744cde2e08baa11aff4048237ba10c7efbc6
-
Jeremy Reizenstein authored
Summary: Move dataset_args and dataloader_args from ExperimentConfig into a new member called datasource so that it can contain replaceables. Also add enum Task for task type. Reviewed By: shapovalov Differential Revision: D36201719 fbshipit-source-id: 47d6967bfea3b7b146b6bbd1572e0457c9365871
-
- 13 May, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: Stronger typing for these functions Reviewed By: shapovalov Differential Revision: D36170489 fbshipit-source-id: a2104b29dbbbcfcf91ae1d076cd6b0e3d2030c0b
-
- 09 May, 2022 1 commit
-
-
Roman Shapovalov authored
Summary: To avoid model_zoo, we need to make GenericModel pluggable. I also align creation APIs for convenience. Reviewed By: bottler, davnov134 Differential Revision: D35933093 fbshipit-source-id: 8228926528eb41a795fbfbe32304b8019197e2b1
-
- 06 Apr, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: Try again to solve https://github.com/facebookresearch/pytorch3d/issues/1144 pickling problem. D35258561 (https://github.com/facebookresearch/pytorch3d/commit/24260130ce382640a89e978cd671298488ec6451) didn't work. When writing a function or vanilla class C which you want people to be able to call get_default_args on, you must add the line enable_get_default_args(C) to it. This causes autogeneration of a hidden dataclass in the module. Reviewed By: davnov134 Differential Revision: D35364410 fbshipit-source-id: 53f6e6fff43e7142ae18ca3b06de7d0c849ef965
-
- 31 Mar, 2022 1 commit
-
-
Jeremy Reizenstein authored
Summary: ListConfig and DictConfig members of get_default_args(X) when X is a callable will contain references to a temporary dataclass and therefore be unpicklable. Avoid this in a few cases. Fixes https://github.com/facebookresearch/pytorch3d/issues/1144 Reviewed By: shapovalov Differential Revision: D35258561 fbshipit-source-id: e52186825f52accee9a899e466967a4ff71b3d25
-
- 25 Mar, 2022 1 commit
-
-
Roman Shapovalov authored
Summary: Before the fix, running get_default_args(C: Callable) returns an unstructured DictConfig which causes Enums to be handled incorrectly. This is a fix. WIP update: Currently tests still fail whenever a function signature contains an untyped argument: This needs to be somehow fixed. Reviewed By: bottler Differential Revision: D34932124 fbshipit-source-id: ecdc45c738633cfea5caa7480ba4f790ece931e8
-
- 21 Mar, 2022 1 commit
-
-
Jeremy Reizenstein authored
Co-authored-by:Jeremy Francis Reizenstein <bottler@users.noreply.github.com>
-