- 01 Nov, 2024 1 commit
-
-
Nikita Shulga authored
Summary: X-link: https://github.com/facebookexternal/vizard/pull/5 X-link: https://github.com/fairinternal/egohowto/pull/72 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/680 Replace `tensor.type().scalarType()` with `tensor.scalar_type()` (this be able to get rid of the cast function in https://github.com/pytorch/pytorch/pull/139358 ) Remove extraneous braces around lambdas Reviewed By: huydhn, r-barnes Differential Revision: D65308547 fbshipit-source-id: d04c62cfa7361c0f69a2eaf1fd331befa9df4395
-
- 04 Oct, 2024 2 commits
-
-
Shangdi Yu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/679 as title, add versioning to avoid breaking d2go CI Reviewed By: wat3rBro Differential Revision: D63907037 fbshipit-source-id: baf94c71c68ab017ed21b4c12eaf2fa69219db68
-
Shangdi Yu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/678 capture_pre_autograd_graph is deprecating. Migrate to use the new API. Reviewed By: navsud, tugsbayasgalan Differential Revision: D63859679 fbshipit-source-id: f14def6bc622cc451020d0edcc312330fa626943
-
- 26 Sep, 2024 1 commit
-
-
Victor Bourgin authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/677 Previously, cfg.SOLVER.DETERMINISTIC was not taken into account for lightning `Trainer` in d2go: - Nested checks `hasattr(cfg, "SOLVER.DETERMINISTIC")` do not work as expected - If SOLVER.DETERMINISTIC exists, we should check that it is set to `True` Reviewed By: ayushidalmia, rbasch Differential Revision: D63426319 fbshipit-source-id: 8caf0af53e7b97a49392df09153e26ee3628231f
-
- 13 Aug, 2024 1 commit
-
-
Josh Fromm authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/675 This diff extends several targets to be hip compatible and fixes a few silly hipification issues with those targets. After these changes, all dependencies needed for the face enhancer can compile with AMD. A few silly issues that I had to hack around, maybe we could improve hipification to avoid similar issues in the future: * Some of the dependencies used sources in `src/cuda/**.cu`. Hipification tried to rename "cuda" to "hip" and broke the paths. I'm not sure where that rename happens so I just changed the directory from "cuda" to "gpu" to avoid the issue. * One header import called `THCAtomics.cuh` was incorrectly being renamed to `THHAtomics.cuh`, which doesnt exist. Fortunately an equivalent import that doesnt have name issues was available. We also might want to consider graduating the cpp_library_hip bazel helper out of fbgemm since it seems pretty generally useful. For some of the targets, we needed to build a python cpp extension, which as far as I can tell we didnt have good hipification for yet. I added a new buck rule very similar to our standard cpp_library_hip rule that creates an extension instead. It's a little copy-pasted so if there are cleaner ways to work around this requirement let me know. Reviewed By: houseroad Differential Revision: D61080247 fbshipit-source-id: dc6f101eb3eadfd43ef5610c651b1639e4c78ae6
-
- 30 Jul, 2024 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/672 Nightly is less stable than d2go itself (eg. issue like https://github.com/facebookresearch/d2go/actions/runs/10135977974), just use the latest build. Differential Revision: D60458684 fbshipit-source-id: 2cce9a0eaabdeba2908703753d67dcd4bb24c378
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/671 > AttributeError: type object 'GeneralizedRCNNTask' has no attribute 'cleanup' EZ fix Reviewed By: rbasch Differential Revision: D60400187 fbshipit-source-id: 25872f4928cf8851ff63e96311c12086e272d619
-
- 11 Jul, 2024 1 commit
-
-
Yichao Lu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/670 registered a new arch without trunk3 since our trunk2 model has better quality. Reviewed By: huiyujie Differential Revision: D59613942 fbshipit-source-id: 605e8925bfcd91d8a966303d9c0a3b4f56a9a0c7
-
- 01 Jul, 2024 1 commit
-
-
Francisc Bungiu authored
Summary: Use different signpost so they don't get deduplicated in minion. Pull Request resolved: https://github.com/facebookresearch/d2go/pull/669 Reviewed By: wat3rBro Differential Revision: D59226344 fbshipit-source-id: c1356feadbc1b63220a1abdd8cc079723b230e42
-
- 22 Jun, 2024 1 commit
-
-
Ahmed Gheith authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/668 Lazy import changes `Python` import semantics, specifically when it comes to initialization of packages/modules: https://www.internalfb.com/intern/wiki/Python/Cinder/Onboarding/Tutorial/Lazy_Imports/Troubleshooting/ For example, this pattern is not guaranteed to work: ``` import torch.optim ... torch.optim._multi_tensor.Adam # may fail to resolve _multi_tensor ``` And this is guaranteed to work: ``` import torch.optim._multi_tensor ... torch.optim._multi_tensor.Adam # will always work ``` A recent change to `PyTorch` changed module initialization logic in a way that exposed this issue. But the code has been working for years? This is the nature of undefined behavior, any change in the environment (in this the `PyTorch` code base can make it fail. Reviewed By: wat3rBro Differential Revision: D58876582 fbshipit-source-id: c8f3f53605822517d646e57ddbf4359af54dba0d
-
- 19 Jun, 2024 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/666 While debugging elevated preemption wastage in d2go, came across a few long running Pinocchio jobs in d2go that do not checkpoint preemption and also do not have checkpointing instrumented. This diff addresses both of these issues. Reviewed By: wat3rBro Differential Revision: D58669254 fbshipit-source-id: 9d1c5ff9e61a4a83d284a45154aa54d2d41178cf
-
- 11 Jun, 2024 1 commit
-
-
Naveen Suda authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/664 Generate qconfig based on the config Reviewed By: wat3rBro Differential Revision: D58210321 fbshipit-source-id: 7a86f8b6e9d112302c978080c2bd5721e3c7dbff
-
- 08 May, 2024 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/662 - d2go: aws model zoo is not available anymore, disable the test in oss. - d2: scripts running at midnight hitting rate limit issue, change it to a random time. Differential Revision: D57085427 fbshipit-source-id: 8dc24b2a7996c8ae5ed8808c3301af2851c15a14
-
- 02 May, 2024 1 commit
-
-
Ayushi Dalmia authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/661 X-link: https://github.com/fairinternal/detectron2/pull/603 X-link: https://github.com/facebookresearch/detectron2/pull/5273 In this diff we make changes to ensure we can control reproducibility in d2go: - update setup.py to enforce deterministic performance if set via config - set lightning parameters if deterministic is passed: ``` { "sync_batchnorm": True, "deterministic": True, "replace_sampler_ddp": False, } ``` - allow passing prefetch_factor, pin_memory, persistent_memory as args to batch dataloader. - minor fix in training sampler Differential Revision: D55767128 fbshipit-source-id: eeab50c95969a91c58f1773473b6fc666494cc16
-
- 24 Apr, 2024 1 commit
-
-
Naveen Suda authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/660 To enable w8a16 sigmoid in d2go, we need to use custom prepare function. Reviewed By: ayushidalmia, jiaxuzhu92 Differential Revision: D56275899 fbshipit-source-id: 654900011a1393e81289e8c9412b5886831765e2
-
- 03 Apr, 2024 1 commit
-
-
Debojeet Chatterjee authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/659 FBNetV3_B_large using FBNetV3_B backbone with large box and kpts heads for better multitask KP+Box detection tasks while being in similar range of #flops and #params. Reviewed By: ashishvshenoy Differential Revision: D55645349 fbshipit-source-id: 3fe84f566b3eeaddf84a94ef708557944fffcd22
-
- 02 Apr, 2024 1 commit
-
-
Debojeet Chatterjee authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/658 D2Go's INPUT.CROP=ENABLED silently fails when annotations dict is present but empty. Ends up not using the dataset at all. Adding an additional check to circumvent this failure. Reviewed By: ayushidalmia Differential Revision: D55640142 fbshipit-source-id: b733b841edb17c16d69332795c89c32b008cb6e5
-
- 27 Mar, 2024 1 commit
-
-
Fanyi Xiao authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/657 Without properly set `requires_grad` for params and buffers, it causes hang in FSDP training. This becomes an issue eg when training with LoRA. Reviewed By: wat3rBro Differential Revision: D55220828 fbshipit-source-id: 1e33aa540c84c4de62a3a37c48a322aa26c98292
-
- 19 Mar, 2024 1 commit
-
-
Geet Sethi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/656 Enable distributed FSDP model initialization. This iteratively moves and shards the model on GPU to allow for the training of models greater than single GPU HBM capacity and which cannot be instantiated multiple times on a single host. The flow is as follows: 1. Rank 0 will init the whole model on CPU using existing code paths, while all other ranks init an 'empty' model using fake tensors. 2. Once this is complete and initialization moves to FSDP, distributed init will traverse the model 'bottom-up', transferring all params/buffers from rank 0 to all other ranks, while simultaneously wrapping modules in FSDP whenever possible (based on the specified config). Thus modules are sharded (and memory usage distributed) at the first possible instance using the existing FSDP api/implementation. Reviewed By: XiaoliangDai Differential Revision: D54287718 fbshipit-source-id: 16d63d78065d1fca0c6baf7a385f666a4e1b2a5f
-
- 14 Mar, 2024 1 commit
-
-
Naveen Suda authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/654 example_inputs and is_qat args are needed for some models during prepare_pt2e step. Reviewed By: chakriu, tarun292 Differential Revision: D54873270 fbshipit-source-id: 67df457aca82fd9da77969133ecf390cdc80fb85
-
- 10 Mar, 2024 1 commit
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/653 # Changes In Mask2Former RC4 training, we need to use a particular weighted category training sampler where `DATALOADER.SAMPLER_TRAIN = "WeightedCategoryTrainingSampler"`. Also there are multiple datasets are used, and the set of each one's categories are not exactly identical. Some datasets have more categories (e.g. Exo-body) than other datasets that do not have exobody annotations. Also we use category filtering by setting `D2GO_DATA.DATASETS.TRAIN_CATEGORIES` to a subset of full categories. In this setup, currently D2GO will complain metadata.thing_classes is NOT consistency across datasets (https://fburl.com/code/k8xbvyfd). The reason is when category filtering is used, D2GO writes a temporary dataset json file (https://fburl.com/code/slb5z6mc). And this tmp json file will be loaded when we get the dataset dicts from DatasetCatalog (https://fburl.com/code/5k4ynyhc). Meanwhile, metadata in MetadataCatalog for this category-filtered dataset is also updated based on categories stored in this tmp file. Therefore, we must ensure categories stored in the tmp file is consistent between multiple category-filtered datasets. In this diff, we update the logic of writing such tmp dataset json file. # Github CI test Note **CI / python-unittest-cpu** is shown as failed with error below. But I do not think it is related to changes in this diff since error is related to observer in the QAT model training, but changes in the diff are related to dataset preparation. ``` Traceback (most recent call last): File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 155, in train self.run_step() File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 310, in run_step loss_dict = self.model(data) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl return forward_call(*args, **kwargs) File "/home/runner/work/d2go/d2go/tests/runner/test_runner_default_runner.py", line 44, in forward ret = self.conv(images.tensor) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1590, in _call_impl hook_result = hook(self, args, result) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/ao/quantization/quantize.py", line 131, in _observer_forward_hook return self.activation_post_process(output) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl return forward_call(*args, **kwargs) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/ao/quantization/fake_quantize.py", line 199, in forward _scale, _zero_point = self.calculate_qparams() File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/ao/quantization/fake_quantize.py", line 194, in calculate_qparams return self.activation_post_process.calculate_qparams() File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/ao/quantization/observer.py", line 529, in calculate_qparams return self._calculate_qparams(self.min_val, self.max_val) File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/ao/quantization/observer.py", line 328, in _calculate_qparams if not check_min_max_valid(min_val, max_val): File "/usr/share/miniconda/envs/__setup_conda/lib/python3.8/site-packages/torch/ao/quantization/utils.py", line 346, in check_min_max_valid assert min_val <= max_val, f"min {min_val} should be less than max {max_val}" AssertionError: min 3.8139522075653076e-05 should be less than max -3.8139522075653076e-05 ``` Reviewed By: ayushidalmia Differential Revision: D54665936 Privacy Context Container: L1243674 fbshipit-source-id: 322ab4a84a710b03fa39b39fa81117752d369ba5
-
- 03 Mar, 2024 1 commit
-
-
Amethyst Reese authored
Summary: Formats the covered files with pyfmt. paintitblack Reviewed By: aleivag Differential Revision: D54447732 fbshipit-source-id: e21fbbe27882c8af183d021f4ac27029cbe93e8e
-
- 23 Feb, 2024 1 commit
-
-
Naveen Suda authored
Summary: Add pt2e quantization support in D2Go. Reviewed By: chakriu Differential Revision: D54132092 fbshipit-source-id: 34a9ba79a5eb49ed27a3f33454078b0df37cf2f0
-
- 17 Feb, 2024 1 commit
-
-
Mo Mo authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/631 omm -> oom Reviewed By: EugenHotaj, wat3rBro Differential Revision: D50860125 fbshipit-source-id: 553220106aed1c8c752347a7a5c01b525ec25588
-
- 08 Feb, 2024 1 commit
-
-
Jiaxu Zhu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/651 As title, so that dataloading is deterministic when `cfg.SEED` is set. Reviewed By: navsud Differential Revision: D53547772 fbshipit-source-id: 73cfd2b351e81b370fb721a4f7b7c2a6313470bd
-
- 04 Feb, 2024 1 commit
-
-
Xiaoliang Dai authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/649 support bfloat16 training Reviewed By: chihyaoma, Sekunde Differential Revision: D53029989 fbshipit-source-id: 2e1d8f2112d238441e3f6801db3092383147fdbd
-
- 17 Jan, 2024 1 commit
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/647 Major changes - **example_input** argument in **prepare_fake_quant_model()** is useful in certain cases. For example, in Argos model **custom_prepare_fx()** method under FX graph + QAT setup (D52760682), it is used to prepare example inputs to individual sub-modules by running one forward pass and bookkeeping the inputs to individual sub-modules. Therefore, we export argument **example_input** in **setup_qat_model()** function. - For QAT model, currently we assert # of state dict keys (excluding observers) should be equal to # of state dict keys in the original model. However, when the assertion fails, it does not log useful information for debugging. We make changes to report what are the unique keys in each state dict. Reviewed By: navsud Differential Revision: D52760688 fbshipit-source-id: 27535a0324ebe6513f198acb839918a0346720d0
-
- 16 Jan, 2024 1 commit
-
-
generatedunixname2443911735787003 authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/648 Differential Revision: D52792992 fbshipit-source-id: d3a64f3b306ea024ec072eaa6327446a84e2d83c
-
- 12 Jan, 2024 1 commit
-
-
Kapil Krishnakumar authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/644 This diff consolidates deterministic settings in D2Go. In the `default_runner.py` file, the `torch.set_float32_matmul_precision("highest")` function is added to set the precision for matrix multiplication to the highest possible value. In the `setup.py` file, the `torch.backends.cudnn.deterministic` setting is set to `True` and the `torch.backends.cudnn.allow_tf32` setting is set to `False` to avoid random pytorch and CUDA algorithms during the training. The `torch.backends.cuda.matmul.allow_tf32` setting is also set to `False` to avoid random matrix multiplication algorithms. Additionally, the `seed` function is used to set the seed for reproducibility. Reviewed By: wat3rBro Differential Revision: D51796739 fbshipit-source-id: 50e44ea50b0311b56a885db9f633491ac3002bd4
-
- 08 Jan, 2024 1 commit
-
-
generatedunixname89002005287564 authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/646 Reviewed By: wat3rBro Differential Revision: D52536555 fbshipit-source-id: e57dc5b2774771f0739118c5244014171732c151
-
- 04 Jan, 2024 1 commit
-
-
generatedunixname89002005287564 authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/645 Reviewed By: zsol Differential Revision: D52536030 fbshipit-source-id: e6d0004c5bea81b5dab0ff69a1e9f6df4929b952
-
- 15 Dec, 2023 2 commits
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/642 When we build a QAT model using FX graph mode API **prepare_qat_fx** and **convert_fx**, they will run symbolic tracing following **module.forward()**. In certain cases, such as a module takes constant tensor input, the symbolic tracing will add new tensor attributes with name prefix **_tensor_constant** (https://fburl.com/code/msc4ch4o), which becomes new keys in the QAT model state dict. In current implementation of **_setup_non_qat_to_qat_state_dict_map**, it asserts # of keys in the state dict of original- and QAT model should be the same. Thus, we extend **qat_state_dict_keys_to_ignore** method by adding an argument, which allows to ignore specified state dict keys in the QAT model. Reviewed By: wat3rBro Differential Revision: D52152706 fbshipit-source-id: 92219feae43bf8841b0a3a71adfbfcb84d8e8f95
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/643 For a QAT model, it contains observers. After QAT training, those observers already contain updated statistics, such as min_val, max_val. When we want to export FP32 QAT model for a sanity check, if we call **fuse_utils.fuse_model()** again (which is often already called when we build the QAT model before QAT training), it will remove statistics in the observers. Reviewed By: wat3rBro Differential Revision: D52152688 fbshipit-source-id: 08aa16f2aa72b3809e0ba2d346f1b806c0e6ede7
-
- 07 Dec, 2023 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/640 Reviewed By: tglik Differential Revision: D51908239 fbshipit-source-id: 7bcbad1fc7065b736cf4e38d155eed5d734758f7
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/639 Expose ability to add a preemption checkpointing hook running in a separate process group. Reviewed By: wat3rBro, ynonaolga Differential Revision: D51115437 fbshipit-source-id: c843802bc59da9f57c09c8d9a20f3d72d5b98edf
-
- 30 Nov, 2023 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/637 Reviewed By: tglik Differential Revision: D51540498 fbshipit-source-id: f246559963c5187140db7b8113765f66a964ae1b
-
- 17 Nov, 2023 1 commit
-
-
Wei Sun authored
Summary: Similar to D48210543. Update the training_hooks to use the Unitrace memory snapshot APIs. This allows us to maintain a singel path for memory snapshot APIs, and also collect important details such as snapshot location for Zoomer. Pulled By: HugeEngine Pull Request resolved: https://github.com/facebookresearch/d2go/pull/636 Reviewed By: frabu6, aaronenyeshi, jackiexu1992, mengluy0125 Differential Revision: D48368150 fbshipit-source-id: b279adfa29d390e615d2c32a7ab9e05d95b4f164
-
- 10 Nov, 2023 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/634 Reviewed By: yzhao30 Differential Revision: D51208655 fbshipit-source-id: 3280bde8807b623ec56841cc6d0ffc87a1e02e83
-
- 09 Nov, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/633 transformer_auto_wrap_policy is buggy and causes issues when wrapping wrapped module. Migrate to ModuleWrapPolicy Reviewed By: tglik Differential Revision: D51124721 fbshipit-source-id: 61c4f5f810ead3c3776a7310926b2181121162ac
-
- 05 Nov, 2023 1 commit
-
-
Zhicheng Yan authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/630 Currently, in runner **build_model()** method, when **eval_only=True**, we always try to load model weights. This is quite restricted in some cases. For example, we may just wanna build a model in eval mode to profile its efficiency, and we have not trained the model or generated the model weights in a checkpoint file. Thus, this diff adds an argument **skip_model_weights** to allow users to skip the loading of model weights. Note, this diff is entirely back-compatible and is NOT expected to break existing implementations. Reviewed By: navsud, wat3rBro Differential Revision: D50623772 fbshipit-source-id: 282dc6f19e17a4dd9eb0048e068c5299bb3d47c2
-