- 30 Mar, 2023 1 commit
-
-
Mircea Cimpoi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/518 Enable profiling for eval step only, not on every eval (which can be called during training) Reviewed By: frabu6 Differential Revision: D44535915 fbshipit-source-id: 4497a3f74f5d751277df9ed41bc9bf21056341c4
-
- 11 Mar, 2023 1 commit
-
-
Peizhao Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/501 X-link: https://github.com/facebookresearch/detectron2/pull/4851 print grad scaler as part of the metric. * Controlled by a flag "SOLVER.AMP.LOG_GRAD_SCALER" Reviewed By: tax313 Differential Revision: D43585363 fbshipit-source-id: 495b37ff524c47e515cea0b3c677ee81b34ad4ca
-
- 09 Mar, 2023 1 commit
-
-
Mircea Cimpoi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/499 Add `prepare_fb_model_for_eval` override; no-op. Reviewed By: frabu6 Differential Revision: D43906444 fbshipit-source-id: 97e06f1de8f3ba07808a0493d3d216031ff011d0
-
- 25 Feb, 2023 1 commit
-
-
Naveen Suda authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/483 Reviewed By: YXIE14 Differential Revision: D42733542 fbshipit-source-id: 0dc936c536554b5beead462eaf74bc007758c12e
-
- 23 Feb, 2023 1 commit
-
-
Matthew Yu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/479 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/467 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/466 This allows internal solution to be plugged in, in a generic fashion, rather than relying on training patterns (FSDP or not). Reviewed By: wat3rBro Differential Revision: D42983444 fbshipit-source-id: a70bf0d25737d9cbbf22e3368363d3fdec57b8b5
-
- 16 Feb, 2023 2 commits
-
-
Anthony Chen authored
Summary: X-link: https://github.com/fairinternal/detectron2/pull/591 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/469 X-link: https://github.com/facebookresearch/detectron2/pull/4785 Add an option to specify the period of metric gathering and writing in Trainer. This feature is needed to optimize training speed for large-scale training jobs like generative AI. The reason is that the all_gather call in metric writing at every iteration is time-consuming when hundreds of gpus are used. This takes ~10% of the total training time. With this feature we can set the metric writing period as the same as cfg.WRITER_PERIOD=20 to reduce training time while still keeping metric logging the same to users Reviewed By: miqueljubert, wat3rBro Differential Revision: D43098985 Privacy Context Container: 2011691122555468 fbshipit-source-id: 63c93a7331aa63badce5125e5240d2d5f7e61b74
-
Tao Xu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/473 As shown in the attached image and tb visualization, some of our jobs fail to save the results to tensorboard. There should be some messages between circled lines of the screenshot if the images are added to tensorboard. One possible reason is that the tensorbord visualization evaluator is only added for the rank 0 gpu. It may fail to fetch any data during evaluation of diffusion model which only do 1 batch of inference during validataion. To resolve this issue, we add the visualization evaluator to all ranks of gpus and gather their results, and finally add the results with biggest batchsize to the tensorboard for visualization. The screenshot is from f410204704 (https://www.internalfb.com/manifold/explorer/mobile_vision_workflows/tree/workflows/xutao/20230211/latest_train/dalle2_decoder.SIULDLpgix/e2e_train/log.txt) Refactored the default_runner.py to have a new function _create_evaluators for create all evaluators. Thus we do not need to override the whole _do_test function in the runner which need to add the visualization evaluator of all ranks. (Note: this ignores all push blocking failures!) Reviewed By: YanjunChen329 Differential Revision: D43263543 fbshipit-source-id: eca2259277584819dcc5400d47fa4fb142f2ed9b
-
- 14 Feb, 2023 1 commit
-
-
Fei Sun authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/472 Add NUMA binding to d2go. It equally distributes the GPUs to the CPU sockets so that the CPU traffic, GPU to CPU traffic are all balanced. It helps the diffusion model training, but it is a general technique that can be applied to all models. We still want to manually enable it in each case though, until we are confident that it gives better performance and set it as a default. NUMA binding is based on jspark1105's work D42827082. Full credit goes to him. This diff does not enable the feature. Reviewed By: newstzpz Differential Revision: D43036817 fbshipit-source-id: fe67fd656ed3980f04bc81909cae7ba2527346fd
-
- 13 Jan, 2023 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/440 Move FSDP wrapping to runner.build_model by rewriting it as a modeling hook **Motivation** When a model is too large to run inference on a single GPU, it requires using FSDP with local checkpointing mode to save peak GPU memory. However, in eval_pytorch workflow (train_net with eval-only), models are evaluated without being wrapped by FSDP. This may cause OOM errors for the reasons above. Thus, it may be a better practice to wrap model with FSDP during `runner.build_model(cfg)`, so evaluation can also be run in the same FSDP setting as in training. This diff moves FSDP wrapping to `runner.build_model(cfg)` by rewriting it as a modeling hook. **API changes** * Users need to append `"FSDPModelingHook"` to `MODEL.MODELING_HOOKS` to enable FSDP. * `FSDP.ALGORITHM` can only be `full` or `grad_optim` **Note** It's not possible to unwrap an FSDP model back to the normal model, so FSDPModelingHook.unapply() can't be implemented Reviewed By: wat3rBro Differential Revision: D41416917 fbshipit-source-id: f3fc72d574cc6ccbe0d238e48c575926ba5b4d06
-
- 05 Jan, 2023 1 commit
-
-
Anthony Chen authored
Summary: X-link: https://github.com/facebookresearch/detectron2/pull/4667 X-link: https://github.com/fairinternal/detectron2/pull/578 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/411 Add config option `cfg.LOAD_CKPT_TO_GPU` to load checkpoints to the worker's current GPU Previously, D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)go maps checkpoints to CPU before loading them to the model. In large-scale distributed training, many GPU processes may be used to train a model. This means each process will load the model checkpoint to a single CPU, causing the same model checkpoint to be loaded many times. This would cause CPU OOM issue when the model checkpoint size is large. There're two solutions to this problem. One is to load checkpoints to GPU; the other one is to use share memory for the checkpoint between different GPU processes. This diff implements the first solution, which can support cases where model size + model checkpoint size is smaller than the total GPU memory. The second solution may be revisited for large models that need to offload checkpoints to cpu. Reference diff: D40789062 Reviewed By: mcimpoi Differential Revision: D41063306 fbshipit-source-id: edcfd390a25582fffb2f1a6a7fc22917874ee2fc
-
- 09 Dec, 2022 1 commit
-
-
Mircea Cimpoi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/436 Renaming `model_ema.py` to `ema.py` (as `modeling` is already in the folder name. Fixing dependencies after rename Reviewed By: wat3rBro Differential Revision: D41685115 fbshipit-source-id: 006999a020a901ea8be4b71e072d688bd36cdce2
-
- 28 Nov, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/427 Re-try previous reverted diff D41350485 (https://github.com/facebookresearch/d2go/commit/0ea6bc1b61ab736ccf1840c58c2b19ed2e9a1282). The problem was essentially because `DefaultTask` is not a subclass of `Runner`, so when we call `Runner`'s class methods from `DefaultTask`, it won't work if the `Runner`'s method also calls other methods that are in `Runner` but not `DefaultTask`. The solution is simply split the data related APIs out into a separate class (mixin), and let `DefaultTask` and `Runner` both subclass from it. Reviewed By: tglik Differential Revision: D41507448 fbshipit-source-id: 8b26c129811436c0bd35e1c6b0705e7035d7e823
-
- 17 Nov, 2022 1 commit
-
-
Anthony Chen authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/396 Integrate PyTorch FSDP, which supports two sharding modes: 1. gradient + optimizer sharding; 2. full model sharding (params + gradient + optimizer). This feature is enabled in the train_net.py code path. Sources * Integration follows this tutorial: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html API changes * Add new config keys to support the new feature. Refer to mobile-vision/d2go/d2go/trainer/fsdp.py for the full list of config options * Add `FSDPCheckpointer` as an inheritance of `QATCheckpointer` to support special loading/saving logic for FSDP models Reviewed By: wat3rBro Differential Revision: D39228316 fbshipit-source-id: 342ecb3bcbce748453c3fba2d6e1b7b7e478473c
-
- 11 Nov, 2022 1 commit
-
-
Anthony Chen authored
Summary: X-link: https://github.com/facebookresearch/detectron2/pull/4654 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/412 Support custom precision dtype [float16, bfloat16] for AMP training on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) backend. There's an old config key `SOLVER.AMP.PRECISION` that only works on lightning backend. This diff enables this config key on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) backend (train_net binary) as well. Reviewed By: tax313, wat3rBro Differential Revision: D40811604 fbshipit-source-id: 58da17ae1519a54243b5295eb4253c297e4d9296
-
- 03 Nov, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/405 - Use the non-hacky way (added in D40818736, https://github.com/facebookresearch/detectron2/pull/4626) to customize offloaded backend for DatasetFromList. - In `D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go`, switch to use `SharedList` (added in D40789062, https://github.com/facebookresearch/mobile-vision/pull/120) by default to save RAM and optionally use `DiskCachedList` to further save RAM. Local benchmarking results (using a ~2.4 GiB dataset) using dev mode: | RAM usage (RES, SHR) | No-dataset | Naive | NumpySerializedList | SharedList | DiskCachedList | | -- | -- | -- | -- | -- | -- | | Master GPU worker. | 8.0g, 2.8g | 21.4g, 2.8g | 11.6g, 2.8g | 11.5g, 5.2g | -- | | Non-master GPU worker | 7.5g, 2.8g | 21.0g, 2.8g | 11.5g, 2.8g | 8.0g, 2.8g | -- | | Per data loader worker | 2.0g, 1.0g | 14.0g, 1.0g | 4.4g, 1.0g | 2.1g, 1.0g | -- | - The memory usage (RES, SHR) is found from `top` command. `RES` is total memory used per process; `SHR` shows how much RAM can be shared inside `RES`. - experiments are done using 2 GPU and 2 data loader workers per GPU, so there're 6 processes in total, the **numbers are per-process**. - `No-dataset`: running the same job with tiny dataset (only 4.47 MiB after serialization), since RAM usage should be negligible, it shows the floor RAM usage. - other experiments are running using a dataset of the size of **2413.57 MiB** after serialization. - `Naive`: vanilla version if we don't offload the dataset to other storage. - `NumpySerializedList`: this optimization was added a long time ago in D19896490. I recalled that the RAM was indeed shared for data loader worker, but seems that there was a regression. Now basically all the processes have a copy of data. - `SharedList`: is enabled in this diff. It shows that only the master GPU needs extra RAM. It's interesting that it uses 3.5GB RAM more than other rank, while the data itself is 2.4GB. I'm not so sure if it's overhead of the storage itself or the overhead caused by sharing it with other processes, since non-master GPU using `NumpySerializedList` also uses 11.5g of RAM, we probably don't need to worry too much about it. - `DiskCachedList`: didn't benchmark, should have no extra RAM usage. Using the above number for a typical 8GPU, 4worker training, assuming the OS and other programs take 20-30GB RAM, the current training will use `11.6g * 8 + 4.4g * 8*4 = 233.6g` RAM, on the edge of causing OOM for a 256gb machine. This aligns with our experience that it supports ~2GB dataset. After the change, the training will use only `(11.5g * 7 + 8.0g) + 2.1g * 8*4 = 155.7g` RAM, which gives a much larger head room, we can thus train with much larger dataset (eg. 20GB) or use more DL workers (eg. 8 workers). Reviewed By: sstsai-adl Differential Revision: D40819959 fbshipit-source-id: fbdc9d2d1d440e14ae8496be65979a09f3ed3638
-
- 31 Oct, 2022 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/403 `cfg.SOLVER.AMP.ENABLED` enabled mixed precision, but this only works for V100 GPUs. For A100s, the equivalent is to enable TF32. Reviewed By: tglik Differential Revision: D40675242 fbshipit-source-id: 5cc3d12cd3d7ec76665e0907ecc87fc5f64d73f0
-
- 26 Oct, 2022 1 commit
-
-
Matthew Yu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/399 Freezing the model before running quantization causes an issue with loading a saved checkpoint bc fusing does not support FrozenBatchNorm2d (which means that the checkpoint could have a fused weight conv.bn.weight whereas the model would have an unfused weight bn.weight). The longer term solution is to add FrozenBatchNorm2d to the fusing support but there are some subtle issues there that will take some time to fix: * need to move FrozenBatchNorm2d out of D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) and into mobile_cv lib * current fuser has options to add new bn ops (e.g., FrozenBatchNorm2d) which we use with ops like SyncBN but this currently is only tested with inference so we need to write some additional checks on training The swap will make freezing compatible with QAT and should still work with standard models. One subtle potential issue is that the current BN swap assumes that BN is a leaf node. If a user runs QAT without fusing BN, the BN will no longer be the leaf node as it will obtain an activation_post_process module in order to record the output. The result is that BN will not be frozen in this specific instance. This should not occur as BN is usually fused. A small adjustment to the BN swap would just be to swap the BN regardless of whether it is a leaf node (but we have to check whether activation_post_process module is retained). Another long term consideration is moving both freezing and quant to modeling hooks so the user can decide the order. Reviewed By: wat3rBro Differential Revision: D40496052 fbshipit-source-id: 0d7e467b833821f7952cd2fce459ae1f76e1fa3b
-
- 23 Oct, 2022 1 commit
-
-
Tsahi Glik authored
Summary: X-link: https://github.com/facebookresearch/mobile-vision/pull/116 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/398 D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go doesn't have per node initialization api, but only per worker initialization that happens per subprocess. Some projects (like IOBT) need to way to do shared initialization before spawning all the workers in subprocess and pass this initialized shared context to the workers. This diff adds API to create a shared context object before launching workers and then use this shared context by the runners inside the workers after launch. Reviewed By: wat3rBro Differential Revision: D40001329 fbshipit-source-id: 231a4e7e4da7b5db50849176c58b104c4565306a
-
- 03 Oct, 2022 1 commit
-
-
Francisc Bungiu authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/378 Some hooks need access to cfg to be initialized correctly. Pass cfg down the hook registration method. Reviewed By: ertrue, miqueljubert Differential Revision: D39303862 fbshipit-source-id: 931c356c7045f95fc0af5b20c7782ea4d1aff138
-
- 29 Sep, 2022 1 commit
-
-
Peizhao Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/377 Only automatically rescale the lr for sgd optimizers. * Seems that only sgd needs scaling the lr, so we change to not to scale lr automatically by default. This will work better for newly added optimizers (like adam). Reviewed By: itomatik, lg-zhang Differential Revision: D39899434 fbshipit-source-id: d6eebc5b07d4489b401c1fc3cea00f5a060fe19d
-
- 31 Aug, 2022 1 commit
-
-
Peizhao Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/355 switch to use inference_on_dataset_with_checkpointing in default runner. Reviewed By: HarounH Differential Revision: D37215292 fbshipit-source-id: c006784ce0b31700bcbb1f79c303fd791f1561ff
-
- 20 Aug, 2022 1 commit
-
-
Xiaofang Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/358 Avoid calling scheduler.step() after the last training iteration is done Reviewed By: wat3rBro Differential Revision: D38605135 fbshipit-source-id: 87a55309bf6d1f7e598b567cc2372b00b8885c7c
-
- 27 Jul, 2022 2 commits
-
-
Mircea Cimpoi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/344 we need access to the modeling hooks in EMA, e.g. build trainer. Reviewed By: wat3rBro Differential Revision: D37997773 fbshipit-source-id: bf4372cd310605fa35aa70f0604b084b047001d8
-
Kevin Chih-Yao Ma authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/342 Add a cfg option to control the frequency of the writers. Currently, the default writers include: ``` writers = [ CommonMetricPrinter(max_iter), JSONWriter(os.path.join(cfg.OUTPUT_DIR, "metrics.json")), tbx_writer, ] ``` Reviewed By: wat3rBro Differential Revision: D38065583 fbshipit-source-id: ebdc20aab71e03b4e18772af78b410f17ba4216d
-
- 29 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/318 Reviewed By: mcimpoi Differential Revision: D37501246 fbshipit-source-id: 6dbe5dcbaf7454f451d4a3bb3fa2d856cc87d5cc
-
- 24 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/313 It's not natural to put runner's default config functions under `d2go/utils/`, move them to `d2go/runner/config_defaults.py` and clean things up. This also reduce the inter-sub-package dependencies. Reviewed By: mattcyu1 Differential Revision: D37407078 fbshipit-source-id: 432644bee4f12306a14bac3dba76ced08b3683aa
-
- 20 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/305 One benefit of having separate registries for D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb) and D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's meta-arch is that there's no need to patch original D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)'s meta arch because we can just register new meta arch in D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go directly. This diff removes the `patch_d2_meta_arch` and makes things simpler. Reviewed By: mcimpoi Differential Revision: D37246483 fbshipit-source-id: c8b7adef1fa7a5ff2f89c376c7e3b39bec8f19ee
-
- 14 Jun, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/293 In order to pass runner during the workflow using "runner name" instead of runner instance, we need to make sure the `get_default_cfg` is not instance method. It can be either staticmethod or classmethod, but I choose classmethod for better inheritance. code mode using following script: ``` #!/usr/bin/env python3 import json import os import subprocess result = subprocess.check_output("fbgs --json 'def get_default_cfg('", shell=True) fbgs = json.loads(result) fbsource_root = os.path.expanduser("~") def _indent(s): return len(s) - len(s.lstrip()) def resolve_instance_method(content): lines = content.split("\n") for idx, line in enumerate(lines): if "def get_default_cfg(self" in line: indent = _indent(line) # find the class for j in range(idx, 0, -1): if lines[j].startswith(" " * (indent - 4) + "class "): class_line = lines[j] break else: raise RuntimeError("Can't find class") print("class_line: ", class_line) if "Runner" in class_line: # check self if not used for j in range(idx + 1, len(lines)): if _indent(lines[j]) < indent: break assert "self" not in lines[j], (j, lines[j]) # update the content assert "def get_default_cfg(self)" in line lines[idx] = lines[idx].replace( "def get_default_cfg(self)", "def get_default_cfg(cls)" ) lines.insert(idx, " " * indent + "classmethod") return "\n".join(lines) return content def resolve_static_method(content): lines = content.split("\n") for idx, line in enumerate(lines): if "def get_default_cfg()" in line: indent = _indent(line) # find the class for j in range(idx, 0, -1): if "class " in lines[j]: class_line = lines[j] break else: print("[WARNING] Can't find class!!!") continue if "Runner" in class_line: # check staticmethod is used for j in range(idx, 0, -1): if lines[j] == " " * indent + "staticmethod": staticmethod_line_idx = j break else: raise RuntimeError("Can't find staticmethod") # update the content lines[idx] = lines[idx].replace( "def get_default_cfg()", "def get_default_cfg(cls)" ) lines[staticmethod_line_idx] = " " * indent + "classmethod" return "\n".join(lines) return content for result in fbgs["results"]: filename = os.path.join(fbsource_root, result["file_name"]) print(f"processing: {filename}") with open(filename) as f: content = f.read() orig_content = content while True: old_content = content content = resolve_instance_method(content) content = resolve_static_method(content) if content == old_content: break if content != orig_content: print("Updating ...") with open(filename, "w") as f: f.write(content) ``` Reviewed By: tglik Differential Revision: D37059264 fbshipit-source-id: b09d5518f4232de95d8313621468905cf10a731c
-
- 26 May, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/262 synchronization should be handled inside creating data loader, otherwise we need to add synchronization everywhere besides the training-loop, which is impossible. Reviewed By: sstsai-adl Differential Revision: D36683362 fbshipit-source-id: 0bb7c9b50656fece5df6a007c37ec5888ee172bc
-
- 25 May, 2022 2 commits
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/261 X-link: https://github.com/facebookresearch/mobile-vision/pull/71 `is_oss` and `fb_overwritable` are also needed in `mobile_cv`, move them from d2go. Reviewed By: zhanghang1989 Differential Revision: D36655821 fbshipit-source-id: 421c4d22d4c4620678908fe13d6e47ab39604ae7
-
Alan Li authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/260 Training + Evaluation for OCR Detection based on D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)GO Purpose: To restart training of OCR detection after abandonment. Updates filepaths and some configs, as well as addresses a race condition situation when creating temp json files for annotations read/write within D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)GO. The training can be restarted from scratch, or specified from a previous model by setting the weights_pth variable. Test plan is purely on 1 particular dataset. Reviewed By: sstsai-adl Differential Revision: D36632783 fbshipit-source-id: fe8677a57d660c495458d083c3fb70b95128b260
-
- 21 May, 2022 1 commit
-
-
Jerry Zhang authored
Summary: X-link: https://github.com/pytorch/pytorch/pull/77608 X-link: https://github.com/pytorch/fx2trt/pull/76 Pull Request resolved: https://github.com/facebookresearch/d2go/pull/249 X-link: https://github.com/fairinternal/ClassyVision/pull/104 X-link: https://github.com/pytorch/benchmark/pull/916 X-link: https://github.com/facebookresearch/ClassyVision/pull/791 X-link: https://github.com/facebookresearch/mobile-vision/pull/68 FX Graph Mode Quantization needs to know whether an fx node is a floating point Tensor before it can decide whether to insert observer/fake_quantize module or not, since we only insert observer/fake_quantize module for floating point Tensors. Currently we have some hacks to support this by defining some rules like NON_OBSERVABLE_ARG_DICT (https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/utils.py#L496), but this approach is fragile and we do not plan to maintain it long term in the pytorch code base. As we discussed in the design review, we'd need to ask users to provide sample args and sample keyword args so that we can infer the type in a more robust way. This PR starts with changing the prepare_fx and prepare_qat_fx api to require user to either provide example arguments thrugh example_inputs, Note this api doesn't support kwargs, kwargs can make https://github.com/pytorch/pytorch/pull/76496#discussion_r861230047 (comment) simpler, but it will be rare, and even then we can still workaround with positional arguments, also torch.jit.trace(https://pytorch.org/docs/stable/generated/torch.jit.trace.html) and ShapeProp: https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/shape_prop.py#L140 just have single positional args, we'll just use a single example_inputs argument for now. If needed, we can extend the api with an optional example_kwargs. e.g. in case when there are a lot of arguments for forward and it makes more sense to pass the arguments by keyword BC-breaking Note: Before: ```python m = resnet18(...) m = prepare_fx(m, qconfig_dict) # or m = prepare_qat_fx(m, qconfig_dict) ``` After: ```python m = resnet18(...) m = prepare_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),)) # or m = prepare_qat_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),)) ``` Reviewed By: vkuzo, andrewor14 Differential Revision: D35984526 fbshipit-source-id: 706c8df71722c9aa5082a6491734f0144f0dd670
-
- 20 May, 2022 1 commit
-
-
Miquel Jubert Hermoso authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/245 At the moment D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go's runner still uses the OSS pattern 1 (see wiki), where the files get remapped. This does not work with D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go, and makes it necessary to use some renaming tricks. This diff refactors the runner setup, to reduce the number of classes, and rely on fb_overwrite to add the correct fields to the config. Reviewed By: wat3rBro Differential Revision: D36316955 fbshipit-source-id: 4aaaece121b8df802f9395648c97a647fa7db857
-
- 17 May, 2022 1 commit
-
-
Peizhao Zhang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/206 Use build_model in d2go in default_runner instead of the version in d2. Differential Revision: D35535568 fbshipit-source-id: e5e2de63787d21a3c51ed3d689f9058c5f4518b3
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402205 fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366
-
- 26 Apr, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/221 Reviewed By: tglik Differential Revision: D35855051 fbshipit-source-id: f742dfbc91bb7a20f632a508743fa93e3a7e9aa9
-
- 12 Apr, 2022 1 commit
-
-
Pavel Pidlypenskyi authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/204 Having additional train hooks is a nice feature to have, especially when one wants to add some training metrics via hooks. Reviewed By: tglik Differential Revision: D35377418 fbshipit-source-id: ca8e00a3c64f992fe9f6975689e50a8b846a1a37
-
- 05 Apr, 2022 1 commit
-
-
Yanghan Wang authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/200 Currently when exporting the RCNN model, we call it with `self.model.inference(inputs, do_postprocess=False)[0]`, therefore the output of exported model is not post-processed, eg. the mask is in the squared shape. This diff adds the option to include postprocess in the exported model. Worth noting that since the input is a single tensor, the post-process doesn't resize the output to original resolution, and we can't apply the post-process twice to further resize it in the Predictor's PostProcessFunc, add an assertion to raise error in this case. But this is fine for most production use cases where the input is not resized. Set `RCNN_EXPORT.INCLUDE_POSTPROCESS` to `True` to enable this. Reviewed By: tglik Differential Revision: D34904058 fbshipit-source-id: 65f120eadc9747e9918d26ce0bd7dd265931cfb5
-
- 31 Mar, 2022 1 commit
-
-
Michael Snower authored
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/201 Adds profiler registry. Reviewed By: Maninae, wat3rBro Differential Revision: D34725664 fbshipit-source-id: 52cb99b618e5ba5f9bd8d272d4dcaa770d66983a
-
- 01 Mar, 2022 1 commit
-
-
Tong Xiao authored
Summary: `Detectron2GoRunner` defaults to trigger an evaluation right after the last iteration in the `runner.do_train` method. This sometimes might be unnecessary, because there is a `runner.do_test` at the end of training anyways. It could also lead to some side effects. For example, it would cause the training and test data loader present at the same time, which led to an OOM issue in our use case. In this diff, we add an option `eval_after_train` in the `EvalHook` to allow users to disable the evaluation after the last training iteration. Reviewed By: wat3rBro Differential Revision: D34295685 fbshipit-source-id: 3612eb649bb50145346c56c072ae9ca91cb199f5
-