"git@developer.sourcefind.cn:change/sglang.git" did not exist on "fbebcb7aa4aa7c7c0d6bab4d915756d616318de1"
  1. 03 Nov, 2022 1 commit
    • Yanghan Wang's avatar
      use SharedList as offload backend of DatasetFromList by default · 01c351bc
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/405
      
      - Use the non-hacky way (added in D40818736, https://github.com/facebookresearch/detectron2/pull/4626) to customize offloaded backend for DatasetFromList.
      - In `D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go`, switch to use `SharedList` (added in D40789062, https://github.com/facebookresearch/mobile-vision/pull/120) by default to save RAM and optionally use `DiskCachedList` to further save RAM.
      
      Local benchmarking results (using a ~2.4 GiB dataset) using dev mode:
      | RAM usage (RES, SHR) | No-dataset | Naive | NumpySerializedList | SharedList | DiskCachedList |
      | -- | -- | -- | -- | -- | -- |
      | Master GPU worker.         | 8.0g, 2.8g | 21.4g, 2.8g | 11.6g, 2.8g | 11.5g, 5.2g | -- |
      | Non-master GPU worker  | 7.5g, 2.8g | 21.0g, 2.8g | 11.5g, 2.8g | 8.0g, 2.8g | -- |
      | Per data loader worker     | 2.0g, 1.0g | 14.0g, 1.0g | 4.4g, 1.0g | 2.1g, 1.0g | -- |
      
      - The memory usage (RES, SHR) is found from `top` command. `RES` is total memory used per process; `SHR` shows how much RAM can be shared inside `RES`.
      - experiments are done using 2 GPU and 2 data loader workers per GPU, so there're 6 processes in total, the **numbers are per-process**.
      - `No-dataset`: running the same job with tiny dataset (only 4.47 MiB after serialization), since RAM usage should be negligible, it shows the floor RAM usage.
      - other experiments are running using a dataset of the size of **2413.57 MiB** after serialization.
        - `Naive`: vanilla version if we don't offload the dataset to other storage.
        - `NumpySerializedList`: this optimization was added a long time ago in D19896490. I recalled that the RAM was indeed shared for data loader worker, but seems that there was a regression. Now basically all the processes have a copy of data.
        - `SharedList`: is enabled in this diff. It shows that only the master GPU needs extra RAM. It's interesting that it uses 3.5GB RAM more than other rank, while the data itself is 2.4GB. I'm not so sure if it's overhead of the storage itself or the overhead caused by sharing it with other processes, since non-master GPU using `NumpySerializedList` also uses 11.5g of RAM, we probably don't need to worry too much about it.
        - `DiskCachedList`: didn't benchmark, should have no extra RAM usage.
      
      Using the above number for a typical 8GPU, 4worker training, assuming the OS and other programs take 20-30GB RAM, the current training will use `11.6g * 8 + 4.4g * 8*4 = 233.6g` RAM, on the edge of causing OOM for a 256gb machine. This aligns with our experience that it supports ~2GB dataset. After the change, the training will use only `(11.5g * 7 + 8.0g) + 2.1g * 8*4 = 155.7g` RAM, which gives a much larger head room, we can thus train with much larger dataset (eg. 20GB) or use more DL workers (eg. 8 workers).
      
      Reviewed By: sstsai-adl
      
      Differential Revision: D40819959
      
      fbshipit-source-id: fbdc9d2d1d440e14ae8496be65979a09f3ed3638
      01c351bc
  2. 02 Jun, 2022 1 commit
  3. 05 Apr, 2022 1 commit
    • Yanghan Wang's avatar
      refactor create_fake_detection_data_loader · 312c6b62
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/199
      
      - `create_fake_detection_data_loader` currently doesn't take `cfg` as input, sometimes we need to test the augmentation that needs more complicated different cfg.
      - name is a bit bad, rename it to `create_detection_data_loader_on_toy_dataset`.
      - width/height were the resized size previously, we want to change it to the size of data source (image files) and use `cfg` to control resized size.
      
      Update V3:
      In V2 there're some test failures, the reason is that V2 is building data loader (via GeneralizedRCNN runner) using actual test config instead of default config before this diff + dataset name change. In V3 we uses the test's runner instead of default runner for the consistency. This reveals some real bugs that we didn't test before.
      
      Reviewed By: omkar-fb
      
      Differential Revision: D35238890
      
      fbshipit-source-id: 28a6037374e74f452f91b494bd455b38d3a48433
      312c6b62
  4. 04 Mar, 2022 1 commit
    • Yanghan Wang's avatar
      delay import for discache · d3115faf
      Yanghan Wang authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/185
      
      The `DiskCachedDatasetFromList` was originally in the `d2go/data/utils.py`, so the class is declared by default. Therefore the clean up call (https://fburl.com/code/cu7hswhx) is always called even when the feature is not enabled. This diff move it to a new place and delay the import, so the clean up won't run.
      
      Reviewed By: tglik
      
      Differential Revision: D34601363
      
      fbshipit-source-id: 734bb9b2c7957d7437ad40c4bfe60a441ec2f23a
      d3115faf
  5. 25 Feb, 2022 1 commit
  6. 15 Apr, 2021 1 commit
  7. 30 Mar, 2021 1 commit
    • Sam Tsai's avatar
      reorganize unit tests · a0658c4a
      Sam Tsai authored
      Summary: Separate unit tests into individual folder based on functionality.
      
      Reviewed By: wat3rBro
      
      Differential Revision: D27132567
      
      fbshipit-source-id: 9a8200be530ca14c7ef42191d59795b05b9800cc
      a0658c4a
  8. 20 Mar, 2021 1 commit
    • Yanghan Wang's avatar
      move test utils to core library · 9d238344
      Yanghan Wang authored
      Summary: Not d2go.tests is not a library for oss, move utils code to d2go.utils.testing
      
      Reviewed By: zhanghang1989
      
      Differential Revision: D26706933
      
      fbshipit-source-id: 85767b66bbb6c67db05e11823beb4840220b2aa3
      9d238344
  9. 03 Mar, 2021 1 commit