1. 25 May, 2022 1 commit
  2. 17 May, 2022 3 commits
  3. 15 May, 2022 1 commit
    • John Reese's avatar
      apply import merging for fbcode (7 of 11) · b3a9204c
      John Reese authored
      Summary:
      Applies new import merging and sorting from µsort v1.0.
      
      When merging imports, µsort will make a best-effort to move associated
      comments to match merged elements, but there are known limitations due to
      the diynamic nature of Python and developer tooling. These changes should
      not produce any dangerous runtime changes, but may require touch-ups to
      satisfy linters and other tooling.
      
      Note that µsort uses case-insensitive, lexicographical sorting, which
      results in a different ordering compared to isort. This provides a more
      consistent sorting order, matching the case-insensitive order used when
      sorting import statements by module name, and ensures that "frog", "FROG",
      and "Frog" always sort next to each other.
      
      For details on µsort's sorting and merging semantics, see the user guide:
      https://usort.readthedocs.io/en/stable/guide.html#sorting
      
      Reviewed By: lisroach
      
      Differential Revision: D36402205
      
      fbshipit-source-id: a4efc688d02da80c6e96685aa8eb00411615a366
      b3a9204c
  4. 19 Apr, 2022 1 commit
  5. 15 Apr, 2022 1 commit
    • Yanghan Wang's avatar
      enable moving traced model between devices · 2235f180
      Yanghan Wang authored
      Summary:
      X-link: https://github.com/facebookresearch/detectron2/pull/4132
      
      X-link: https://github.com/fairinternal/detectron2/pull/568
      
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/203
      
      For full discussion: https://fb.workplace.com/groups/1405155842844877/posts/5744470455580039
      
      Tracing the `.to(device)` will cause problem when moving the traced torchscript to another device (eg. from cpu to gpu, or even, from `cuda:0` to `cuda:1`). The reason is that `device` is not a `torch.Tensor`, so the tracer just hardcode the value during tracing. The solution is scripting the casting operation.
      
      Here's the code snippet illustrating this:
      ```
      # define the MyModel similar to GeneralizedRCNN, which casts the input to the model's device
      class MyModel(nn.Module):
          def __init__(self):
              super().__init__()
      
              self.conv1 = nn.Conv2d(3, 20, 5)
              self.conv2 = nn.Conv2d(20, 20, 5)
      
          def forward(self, x):
              # cast the input to the same device as this model, this makes it possible to
              # take a cpu tensor as input when the model is on GPU.
              x = x.to(self.conv1.weight.device)
      
              x = F.relu(self.conv1(x))
              return F.relu(self.conv2(x))
      
      # export the model by tracing
      model = MyModel()
      x = torch.zeros([1, 3, 32, 32])
      ts = torch.jit.trace(model, x)
      print(ts.graph)
      
      # =====================================================
      graph(%self.1 : __torch__.MyModel,
            %x : Float(1, 3, 32, 32, strides=[3072, 1024, 32, 1], requires_grad=0, device=cpu)):
        %conv2 : __torch__.torch.nn.modules.conv.___torch_mangle_0.Conv2d = prim::GetAttr[name="conv2"](%self.1)
        %conv1 : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv1"](%self.1)
        %14 : int = prim::Constant[value=6]() # <ipython-input-2-5abde0efc36f>:11:0
        %15 : int = prim::Constant[value=0]() # <ipython-input-2-5abde0efc36f>:11:0
        %16 : Device = prim::Constant[value="cpu"]() # <ipython-input-2-5abde0efc36f>:11:0
        %17 : NoneType = prim::Constant()
        %18 : bool = prim::Constant[value=0]() # <ipython-input-2-5abde0efc36f>:11:0
        %19 : bool = prim::Constant[value=0]() # <ipython-input-2-5abde0efc36f>:11:0
        %20 : NoneType = prim::Constant()
        %input.1 : Float(1, 3, 32, 32, strides=[3072, 1024, 32, 1], requires_grad=0, device=cpu) = aten::to(%x, %14, %15, %16, %17, %18, %19, %20) # <ipython-input-2-5abde0efc36f>:11:0
        %72 : Tensor = prim::CallMethod[name="forward"](%conv1, %input.1)
        %input.5 : Float(1, 20, 28, 28, strides=[15680, 784, 28, 1], requires_grad=1, device=cpu) = aten::relu(%72) # /mnt/xarfuse/uid-20293/a90d1698-seed-nspid4026533681_cgpid21128615-ns-4026533618/torch/nn/functional.py:1406:0
        %73 : Tensor = prim::CallMethod[name="forward"](%conv2, %input.5)
        %61 : Float(1, 20, 24, 24, strides=[11520, 576, 24, 1], requires_grad=1, device=cpu) = aten::relu(%73) # /mnt/xarfuse/uid-20293/a90d1698-seed-nspid4026533681_cgpid21128615-ns-4026533618/torch/nn/functional.py:1406:0
        return (%61)
      # =====================================================
      
      # PyTorch cuda works
      model = copy.deepcopy(model)
      model.to("cuda")
      y = model(x)
      # torchscript cpu works
      y = ts(x)
      # torchscript cuda doesn't work
      ts = ts.to("cuda")
      y = ts(x)
      
      # =====================================================
      RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
      ---------------------------------------------------------------------------
      RuntimeError                              Traceback (most recent call last)
      <ipython-input-4-2aece3ad6c9a> in <module>
            7 # torchscript cuda doesn't work
            8 ts = ts.to("cuda")
      ----> 9 y = ts(x)
      /mnt/xarfuse/uid-20293/a90d1698-seed-nspid4026533681_cgpid21128615-ns-4026533618/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
         1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
         1109                 or _global_forward_hooks or _global_forward_pre_hooks):
      -> 1110             return forward_call(*input, **kwargs)
         1111         # Do not call functions when jit is used
         1112         full_backward_hooks, non_full_backward_hooks = [], []
      RuntimeError: The following operation failed in the TorchScript interpreter.
      # =====================================================
      
      # One solution is scripting the casting instead of tracing it, the folloing code demonstrate how to do it. We need to use mixed scripting/tracing
      torch.jit.script_if_tracing
      def cast_device_like(src: torch.Tensor, dst: torch.Tensor) -> torch.Tensor:
          return src.to(dst.device)
      
      class MyModel2(nn.Module):
          def __init__(self):
              super().__init__()
      
              self.conv1 = nn.Conv2d(3, 20, 5)
              self.conv2 = nn.Conv2d(20, 20, 5)
      
          def forward(self, x):
              # cast the input to the same device as this model, this makes it possible to
              # take a cpu tensor as input when the model is on GPU.
              x = cast_device_like(x, self.conv1.weight)
      
              x = F.relu(self.conv1(x))
              return F.relu(self.conv2(x))
      
      # export the model by tracing
      model = MyModel2()
      x = torch.zeros([1, 3, 32, 32])
      ts = torch.jit.trace(model, x)
      print(ts.graph)
      
      # =====================================================
      graph(%self.1 : __torch__.MyModel2,
            %x : Float(1, 3, 32, 32, strides=[3072, 1024, 32, 1], requires_grad=0, device=cpu)):
        %conv2 : __torch__.torch.nn.modules.conv.___torch_mangle_5.Conv2d = prim::GetAttr[name="conv2"](%self.1)
        %conv1 : __torch__.torch.nn.modules.conv.___torch_mangle_4.Conv2d = prim::GetAttr[name="conv1"](%self.1)
        %conv1.1 : __torch__.torch.nn.modules.conv.___torch_mangle_4.Conv2d = prim::GetAttr[name="conv1"](%self.1)
        %weight.5 : Tensor = prim::GetAttr[name="weight"](%conv1.1)
        %14 : Function = prim::Constant[name="cast_device_like"]()
        %input.1 : Tensor = prim::CallFunction(%14, %x, %weight.5)
        %68 : Tensor = prim::CallMethod[name="forward"](%conv1, %input.1)
        %input.5 : Float(1, 20, 28, 28, strides=[15680, 784, 28, 1], requires_grad=1, device=cpu) = aten::relu(%68) # /mnt/xarfuse/uid-20293/a90d1698-seed-nspid4026533681_cgpid21128615-ns-4026533618/torch/nn/functional.py:1406:0
        %69 : Tensor = prim::CallMethod[name="forward"](%conv2, %input.5)
        %55 : Float(1, 20, 24, 24, strides=[11520, 576, 24, 1], requires_grad=1, device=cpu) = aten::relu(%69) # /mnt/xarfuse/uid-20293/a90d1698-seed-nspid4026533681_cgpid21128615-ns-4026533618/torch/nn/functional.py:1406:0
        return (%55)
      # =====================================================
      
      # PyTorch cuda works
      model = copy.deepcopy(model)
      model.to("cuda")
      y = model(x)
      # torchscript cpu works
      y = ts(x)
      # Note that now torchscript cuda works
      ts = ts.to("cuda")
      y = ts(x)
      print(y.device)
      
      # =====================================================
      cuda:0
      # =====================================================
      ```
      
      For D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb), this diff creates a `move_tensor_device_same_as_another(A, B)` function to replace `A.to(B.device)`. This diff updates the `rcnn.py` and all its utils.
      
      For D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go, since the exported model will become device-agnostic, we can remove the "_gpu" from predictor-type.
      
      Update (April 11):
      Add test to cover tracing on one device and move traced model to another device for inference. When GPU is available, it'll trace on `cuda:0` and run inference on `cpu`, `cuda:0` (and `cuda:N-1` if available).
      
      Summary of the device related patterns
      - The usage of `.to(dtype=another_dype)` won't affect device.
      - Explicit device casting like `.to(device)` can be generally replaced by `move_device_like`.
      - For creating variable directly on device (eg. `torch.zeros`, `torch.arange`), we can replace then with ScriptModule to avoid first create on CPU and then move to new device.
          - Creating things on tracing device and then moving to new device is dangerous, because tracing device (eg. `cuda:0`) might not be available (eg. running on CPU-only machine).
          - It's hard to write `image_list.py` in this pattern because the size behaves differently during tracing (int vs. scalar tensor), in this diff, still create on CPU first and then move to target device.
      
      Reviewed By: tglik
      
      Differential Revision: D35367772
      
      fbshipit-source-id: 02d07e3d96da85f4cfbeb996e3c14c2a6f619beb
      2235f180
  6. 05 Apr, 2022 2 commits
    • Yanghan Wang's avatar
      support do_postprocess when tracing rcnn model in D2 style · 647a3fdf
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/200
      
      Currently when exporting the RCNN model, we call it with `self.model.inference(inputs, do_postprocess=False)[0]`, therefore the output of exported model is not post-processed, eg. the mask is in the squared shape. This diff adds the option to include postprocess in the exported model.
      
      Worth noting that since the input is a single tensor, the post-process doesn't resize the output to original resolution, and we can't apply the post-process twice to further resize it in the Predictor's PostProcessFunc, add an assertion to raise error in this case. But this is fine for most production use cases where the input is not resized.
      
      Set `RCNN_EXPORT.INCLUDE_POSTPROCESS` to `True` to enable this.
      
      Reviewed By: tglik
      
      Differential Revision: D34904058
      
      fbshipit-source-id: 65f120eadc9747e9918d26ce0bd7dd265931cfb5
      647a3fdf
    • Yanghan Wang's avatar
      refactor create_fake_detection_data_loader · 312c6b62
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/199
      
      - `create_fake_detection_data_loader` currently doesn't take `cfg` as input, sometimes we need to test the augmentation that needs more complicated different cfg.
      - name is a bit bad, rename it to `create_detection_data_loader_on_toy_dataset`.
      - width/height were the resized size previously, we want to change it to the size of data source (image files) and use `cfg` to control resized size.
      
      Update V3:
      In V2 there're some test failures, the reason is that V2 is building data loader (via GeneralizedRCNN runner) using actual test config instead of default config before this diff + dataset name change. In V3 we uses the test's runner instead of default runner for the consistency. This reveals some real bugs that we didn't test before.
      
      Reviewed By: omkar-fb
      
      Differential Revision: D35238890
      
      fbshipit-source-id: 28a6037374e74f452f91b494bd455b38d3a48433
      312c6b62
  7. 24 Mar, 2022 1 commit
  8. 12 Jan, 2022 1 commit
  9. 30 Dec, 2021 1 commit
  10. 29 Dec, 2021 1 commit
  11. 08 Nov, 2021 1 commit
    • Yanghan Wang's avatar
      rename @legacy to @c2_ops · 95ab768e
      Yanghan Wang authored
      Reviewed By: sstsai-adl
      
      Differential Revision: D32216605
      
      fbshipit-source-id: bebee1edae85e940c7dcc6a64dbe341a2fde36a2
      95ab768e
  12. 22 Oct, 2021 1 commit
  13. 15 Oct, 2021 2 commits
    • Peizhao Zhang's avatar
      Supported specifying customized parameter groups from model. · 87ce583c
      Peizhao Zhang authored
      Summary:
      Supported specifying customized parameter groups from model.
      * Allow model to specify customized parameter groups by implementing a function `model.get_optimizer_param_groups(cfg)`
      * Supported model with ddp.
      
      Reviewed By: zhanghang1989
      
      Differential Revision: D31289315
      
      fbshipit-source-id: c91ba8014508e9fd5f172601b9c1c83c188338fd
      87ce583c
    • Peizhao Zhang's avatar
      Refactor for get_optimizer_param_groups. · 2dc3bc02
      Peizhao Zhang authored
      Summary:
      Refactor for get_optimizer_param_groups.
      * Split `get_default_optimizer_params()` into multiple functions:
        * `get_optimizer_param_groups_default()`
        * `get_optimizer_param_groups_lr()`
        * `get_optimizer_param_groups_weight_decay()`
      * Regroup the parameters to create the minimal amount of groups.
      * Print all parameter groups when the optimizer is created.
          Param group 0: {amsgrad: False, betas: (0.9, 0.999), eps: 1e-08, lr: 10.0, params: 1, weight_decay: 1.0}
          Param group 1: {amsgrad: False, betas: (0.9, 0.999), eps: 1e-08, lr: 1.0, params: 1, weight_decay: 1.0}
          Param group 2: {amsgrad: False, betas: (0.9, 0.999), eps: 1e-08, lr: 1.0, params: 2, weight_decay: 0.0}
      * Add some unit tests.
      
      Reviewed By: zhanghang1989
      
      Differential Revision: D31287783
      
      fbshipit-source-id: e87df0ae0e67343bb2130db945d8faced44d7411
      2dc3bc02
  14. 24 Sep, 2021 2 commits
  15. 15 Sep, 2021 1 commit
  16. 09 Sep, 2021 1 commit
  17. 31 Aug, 2021 1 commit
    • Yanghan Wang's avatar
      enable (fake) inference for bolt exported model · e62c0e4c
      Yanghan Wang authored
      Summary:
      Enable the inference for boltnn (via running torchscript).
      - merge rcnn's boltnn test with other export types.
      - misc fixes.
      
      Differential Revision: D30610386
      
      fbshipit-source-id: 7b78136f8ca640b5fc179cb47e3218e709418d71
      e62c0e4c
  18. 18 Aug, 2021 2 commits
    • Siddharth Shah's avatar
      torch batch boundary CE loss · 7ae35eec
      Siddharth Shah authored
      Summary:
      A torch version which is batched allows us to avoid CPU <--> GPU copy which
      gets us ~200ms per iteration saving. This new version of generating boundary
      weight mask produces identical masks.
      
      Reviewed By: wat3rBro
      
      Differential Revision: D30176412
      
      fbshipit-source-id: 877f4c6337e7870d3bafd8eb9157ac166ddd588a
      7ae35eec
    • Valentin Andrei's avatar
      Add multi-tensor optimizer version for SGD · 918abe42
      Valentin Andrei authored
      Summary:
      Added multi-tensor optimizer implementation for SGD, from `torch.optim._multi_tensor`. It can potentially provide ~5% QPS improvement by using `foreach` API to speed up the optimizer step.
      
      Using it is optional, from the configuration file, by specifying `SGD_MT` in the `SOLVER.OPTIMIZER` setting.
      
      Reviewed By: zhanghang1989
      
      Differential Revision: D30377761
      
      fbshipit-source-id: 06107f1b91e9807c1db5d1b0ca6be09fcbb13e67
      918abe42
  19. 17 Aug, 2021 1 commit
  20. 14 Jun, 2021 1 commit
  21. 25 May, 2021 1 commit
    • Yanghan Wang's avatar
      update RCNN model test base · 0ab6d3f1
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/75
      
      Refactor the base test case
      - make test_dir valid throughout the test (rather than under local context), so individual test can load back the export model
      - refactor the `custom_setup_test` for easier override.
      - move parameterized into base class to avoid copying naming function
      
      Reviewed By: zhanghang1989
      
      Differential Revision: D28651067
      
      fbshipit-source-id: c59a311564f6114039e20ed3a23e5dd9c84f4ae4
      0ab6d3f1
  22. 04 May, 2021 1 commit
  23. 15 Apr, 2021 1 commit
  24. 30 Mar, 2021 1 commit
    • Sam Tsai's avatar
      reorganize unit tests · a0658c4a
      Sam Tsai authored
      Summary: Separate unit tests into individual folder based on functionality.
      
      Reviewed By: wat3rBro
      
      Differential Revision: D27132567
      
      fbshipit-source-id: 9a8200be530ca14c7ef42191d59795b05b9800cc
      a0658c4a