init

754fbc04 · bailuo · 7aa1ab82 · 754fbc04 · 754fbc04 · 754fbc04
Commit 754fbc04 authored Jul 16, 2024 by bailuo
20 changed files
--- a/LICENSE.txt
+++ b/LICENSE.txt
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
+# OmniMotion
+一种在视频序列中密集和长距离运动估计方法，可对运动目标逐像素跟踪。
+## 论文
+`Tracking Everything Everywhere All at Once`
+- https://arxiv.org/abs/2306.05422
+## 模型结构
+<!-- 此处一句话简要介绍模型结构 -->
+先把一个序列表示成一个准3D的规范量，然后通过定义一个双射，这样我们通过一个准3D空间，就可以描述一个完整的运动。
+<div align=center>
+    <img src="./doc/基本原理.png"/>
+</div>
+## 算法原理
+OmniMotion 保留了投影到每个像素的所有场景点的信息，以及它们的相对深度顺序，这让画面中的点即使暂时被遮挡，也能对其进行追踪。将一整个视频序列作为输入, 同时还输入噪声运动估计(例如光流估计), 然后解出一个完整、全局的运动轨迹。然后，添加了一个优化过程，使其可以用任何帧中的任何像素查询表征，以在整个视频中产生平滑、准确的运动轨迹。
+<!-- <div align=center>
+    <img src="./doc/基本原理.png"/>
+</div> -->
+## 环境配置
+```
+mv omnimotion_pytoch omnimotion # 去框架名后缀
+# -v 路径、docker_name和imageID根据实际情况修改
+```
+### Docker（方法一）
+此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-ubuntu20.04-dtk23.10-py38 # 本镜像imageID为：0a56ef1842a7
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /your_code_path/omnimotion
+pip install -r requirements.txt
+```
+### Dockerfile（方法二）
+此处提供dockerfile的使用方法
+```
+cd /your_code_path/omnimotion/docker
+docker build --no-cache -t codestral:latest .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /your_code_path/omnimotion
+pip install -r requirements.txt
+```
+### Anaconda（方法三）
+<!-- 此处提供本地配置、编译的详细步骤，例如： -->
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+DTK驱动：dtk23.10
+python：python3.8
+pytorch:1.13.1
+```
+`Tips：以上DTK驱动、python、pytorch等DCU相关工具版本需要严格一一对应`
+其它非深度学习库参照requirements.txt安装：
+```
+pip install -r requirements.txt
+```
+## 数据集
+`DAVIS`
+- https://davischallenge.org/index.html
+<!-- - 此处填写公开数据集在公司内部的下载地址（数据集存放中心为：[SCNet AIDatasets](http://113.200.138.88:18080/aidatasets) ，模型用到的各公开数据集请分别填上具体地址。），过小权重文件可打包到项目里。 -->
+<!-- - 此处填写公开数据集官网下载地址（非必须）。 -->
+此处提供数据下载、预处理脚本的使用方法
+```
+cd /your_code_path/omnimotion/
+python get_davis.py # 下载数据集DAVIS-2017-trainval-480p
+python main_processing.py # 预处理数据集
+```
+训练数据目录结构如下，用于正常训练的完整数据集请按此目录结构进行制备：
+```
+├──DAVIS
+    ├──sequence_1/
+        ├──color/
+        ├──mask/ (optional; only used for visualization purposes)
+        ├──count_maps/
+        ├──features/
+        ├──raft_exhaustive/
+        ├──raft_masks/
+        ├──flow_stats.json
+    ├──sequence_2/
+    ├──...
+```
+## 训练
+<!-- 一般情况下，ModelZoo上的项目提供单机训练的启动方法即可，单机多卡、单机单卡至少提供其一训练方法。 -->
+### 单机多卡
+```
+python train.py --config configs/default.txt # 注意config文件的expname和data_dir参数
+```
+<!-- ### 单机单卡
+```
+sh xxx.sh 或python xxx.py
+``` -->
+## 推理
+```
+python viz.py --config configs/default.txt
+```
+## result
+<!-- 此处填算法效果测试图（包括输入、输出） -->
+训练loss情况，视频序列为`dogs-jump`，绿色为GPU，橘色为DCU
+<div align=center>
+    <img src="./doc/loss.png"/>
+</div>
+可视化结果
+- GPU 
+<video src="./doc/GPU-dogs-jump_corr_foreground_100000.mp4" controls="controls" width="700" height="200"></video>
+- DCU 
+<video src="./doc/DCU-dogs-jump_corr_foreground_100000.mp4" controls="controls" width="700" height="200"></video>
+### 精度
+无
+<!-- 测试数据：[test data](链接)，使用的加速卡:xxx。
+根据测试结果情况填写表格：
+| xxx | xxx | xxx | xxx | xxx |
+| :------: | :------: | :------: | :------: |:------: |
+| xxx | xxx | xxx | xxx | xxx  |
+| xxx | xx | xxx | xxx | xxx | -->
+## 应用场景
+### 算法类别
+<!-- 参考此分类方法（上传时请去除参考图片），与icon图标类别一致，请勿随意命名： -->
+<!-- <div align=center>
+    <img src="./doc/icon.png"/>
+</div> -->
+<!-- 超出以上分类的类别命名也可参考此网址中的类别名：https://huggingface.co/ \ -->
+`目标跟踪`
+### 热点应用行业
+<!-- 应用行业的填写需要做大量调研，从而为使用者提供专业、全面的推荐，除特殊算法，通常推荐数量>=3。 -->
+`制造,电商,医疗,教育`
+<!-- ## 预训练权重 -->
+<!-- - 此处填写预训练权重在公司内部的下载地址（预训练权重存放中心为：[SCNet AIModels](http://113.200.138.88:18080/aimodels) ，模型用到的各预训练权重请分别填上具体地址。），过小权重文件可打包到项目里。
+- 此处填写公开预训练权重官网下载地址（非必须）。 -->
+## 源码仓库及问题反馈
+<!-- - 此处填本项目gitlab地址 -->
+- https://developer.hpccube.com/codes/bailuo/omnimotion_pytorch
+## 参考资料
+<!-- - 此处填源github地址（方便使用者查看原github issue）
+- 此处填参考项目或教程网址 -->
+- https://github.com/qianqianwang68/omnimotion
+<!-- `关于model.properties（必要）、LICENSE（必要）、CONTRIBUTORS、模型图标（必要）等其它信息提供参照： `[`ModelZooStd.md`](./ModelZooStd.md)
+`各个模型需要保留原项目README.md，改名为README_origin.md即可。` -->
--- a/README_o.md
+++ b/README_o.md
+# Tracking Everything Everywhere All at Once
+PyTorch Implementation for paper [Tracking Everything Everywhere All at Once]((https://omnimotion.github.io/)), ICCV 2023.
+[Qianqian Wang](https://www.cs.cornell.edu/~qqw/) <sup>1,2</sup>,
+[Yen-Yu Chang](https://yuyuchang.github.io/) <sup>1</sup>,
+[Ruojin Cai](https://www.cs.cornell.edu/~ruojin/) <sup>1</sup>,
+[Zhengqi Li](https://zhengqili.github.io/) <sup>2</sup>,
+[Bharath Hariharan](https://www.cs.cornell.edu/~bharathh/) <sup>1</sup>,
+[Aleksander Holynski](https://holynski.org/) <sup>2,3</sup>,
+[Noah Snavely](https://www.cs.cornell.edu/~snavely/) <sup>1,2</sup>
+<br>
+<sup>1</sup>Cornell University,  <sup>2</sup>Google Research,  <sup>3</sup>UC Berkeley
+#### [Project Page](https://omnimotion.github.io/) | [Paper](https://arxiv.org/pdf/2306.05422.pdf) | [Video](https://www.youtube.com/watch?v=KHoAG3gA024)
+## Installation
+The code is tested with `python=3.8` and `torch=1.10.0+cu111` on an A100 GPU.
+```
+git clone --recurse-submodules https://github.com/qianqianwang68/omnimotion/
+cd omnimotion/
+conda create -n omnimotion python=3.8
+conda activate omnimotion
+pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
+pip install matplotlib tensorboard scipy opencv-python tqdm tensorboardX configargparse ipdb kornia imageio[ffmpeg]
+```
+## Training
+1. Please refer to the [preprocessing instructions](preprocessing/README.md) for preparing input data 
+   for training OmniMotion. We also provide some processed [data](https://omnimotion.cs.cornell.edu/dataset/)
+   that you can download, unzip and directly train on. (Note that depending on the network speed, 
+   it may be faster to run the processing script locally than downloading the processed data).
+2.  With processed input data, run the following command to start training:
+    ```
+    python train.py --config configs/default.txt --data_dir {sequence_directory}
+    ```
+    You can view visualizations on tensorboard by running `tensorboard --logdir logs/`. 
+    By default, the script trains 100k iterations which takes 8~9h on an A100 GPU and 12-13h on RTX4090.
+If you want to skip the optimization and see what the results/formats look like, we provide the weights
+for a few sequences [here](https://drive.google.com/drive/folders/16ekLy-4LTkYAavYrWaKk2qUpJ9TyMXlO?usp=sharing).
+You can use `viz.py` to visualize the correspondences produced by the models. Please refer to the next section for more details.
+## Visualization
+The training pipeline generates visualizations (correspondences, pseudo-depth maps, etc) every certain number of steps (saved in `args.out_dir/vis`). 
+You can also visualize grid points / trails after training by running: 
+```
+python viz.py --config configs/default.txt --data_dir {sequence_directory}
+```
+Make sure `expname` and `data_dir` are correctly specified, so that the
+model and data can be loaded. By specifying `expname`, the latest checkpoints that match that `expname` 
+will be loaded. Alternatively, you can specify `ckpt_path` to select a particular checkpoint.
+To generate the motion trail visualization, foreground/background segmentation mask is required. 
+For DAVIS videos one can just use the mask annotations provided by the dataset. For custom videos that don't come with
+foreground segmentation masks, you can use [remove.bg](https://www.remove.bg/) to remove the background 
+for the query frame, download the masked image and set `foreground_mask_path` to its path. 
+[Here](https://omnimotion.cs.cornell.edu/dataset/mask_0.png) is an example of the masked image for the first frame
+of the `butterfly` sequence. 
+```
+python viz.py --config configs/default.txt --data_dir {sequence_directory} --foreground_mask_path {mask_file_path}
+```
+If you download the provided model weights for a sequence from [here](https://drive.google.com/drive/folders/16ekLy-4LTkYAavYrWaKk2qUpJ9TyMXlO?usp=sharing),
+you can visualize the correspondences by running the `viz.py` script and 
+setting `data_dir` to the unzipped directory, `ckpt_path` to the path for
+`model_100000.pth` in the directory, and optionally 
+`foreground_mask_path`as the path to `mask_0.png` 
+(only required for non-DAVIS sequences `butterfly`, `kangaroo`, and `swing_tire` if you want to visualize their motion trails).
+## Troubleshooting
+- The training code utilizes approximately 22GB of CUDA memory. If you encounter CUDA out of memory errors, 
+  you may consider reducing the number of sampled points `num_pts` and the chunk size `chunk_size`.
+- Due to the highly non-convex nature of the underlying optimization problem, we observe that the optimization process 
+  can be sensitive to initialization for certain difficult videos. If you notice significant inaccuracies in surface
+  orderings (by examining the pseudo depth maps) persist after 40k steps, 
+  it is very likely that training won't recover from that. You may consider restarting the training with a 
+  different `loader_seed` to change the initialization. 
+  If surfaces are incorrectly put at the nearest depth planes (which are not supposed to be the closest), 
+  we found using `mask_near` to disable near samples in the beginning of the training could help in some cases.  
+- Another common failure we noticed is that instead of creating a single object in the canonical space with
+  correct motion, the method creates duplicated objects in the canonical space with short-ranged motion for each.
+  This has to do with both that the input correspondences on the object being sparse and short-ranged, 
+  and the optimization being stuck at local minima. This issue may be alleviated with better and longer-range input correspondences 
+  such as from [TAPIR](https://deepmind-tapir.github.io/) and [CoTracker](https://co-tracker.github.io/). 
+  Alternatively, you may consider adjusting `loader_seed` or the learning rates.
+## Citation
+```
+@article{wang2023omnimotion,
+    title   = {Tracking Everything Everywhere All at Once},
+    author  = {Wang, Qianqian and Chang, Yen-Yu and Cai, Ruojin and Li, Zhengqi and Hariharan, Bharath and Holynski, Aleksander and Snavely, Noah},
+    journal = {ICCV},
+    year    = {2023}
+}
+```
--- a/config.py
+++ b/config.py
+import configargparse
+def config_parser():
+    parser = configargparse.ArgumentParser()
+    parser.add_argument('--config', is_config_file=True, help='config file path')
+    # general
+    parser.add_argument('--data_dir', type=str, help='the directory for the video sequence')
+    parser.add_argument('--expname', type=str, default='', help='experiment name')
+    parser.add_argument('--local_rank', type=int, default=0, help='rank for distributed training')
+    parser.add_argument('--save_dir', type=str, default='out/', help='output dir')
+    parser.add_argument('--ckpt_path', type=str, default='', help='checkpoint path')
+    parser.add_argument('--no_reload', action='store_true', help='do not reload the weights')
+    parser.add_argument('--distributed', type=int, default=0, help='if use distributed training')
+    parser.add_argument('--num_iters', type=int, default=100000, help='number of iterations')
+    parser.add_argument('--num_workers', type=int, default=4, help='number of workers')
+    parser.add_argument('--load_opt', type=int, default=1, help='if loading optimizers')
+    parser.add_argument('--load_scheduler', type=int, default=1, help='if loading schedulers')
+    parser.add_argument('--loader_seed', type=int, default=12,
+                        help='the random seed used for DataLoader')
+    # data
+    parser.add_argument('--dataset_types', type=str, default='flow', help='only flow is included in the current version')
+    parser.add_argument('--dataset_weights', nargs='+', type=float, default=[1.], help='the weight for each dataset')
+    parser.add_argument('--num_imgs', type=int, default=250, help='max number of images to train')
+    parser.add_argument('--num_pairs', type=int, default=8, help='# image pairs to sample in each batch')
+    parser.add_argument('--num_pts', type=int, default=256, help='# pts to sample from each pair of images')
+    # lr
+    parser.add_argument('--lr_feature', type=float, default=1e-3, help='learning rate for feature mlp')
+    parser.add_argument('--lr_deform', type=float, default=1e-4, help='learning rate for deform mlp')
+    parser.add_argument('--lr_color', type=float, default=3e-4, help='learning rate for color mlp')
+    parser.add_argument("--lrate_decay_steps", type=int, default=20000,
+                        help='decay learning rate by a factor every specified number of steps')
+    parser.add_argument("--lrate_decay_factor", type=float, default=0.5,
+                        help='decay learning rate by a factor every specified number of steps')
+    parser.add_argument("--grad_clip", type=float, default=0, help='clip the gradient to avoid training instability')
+    # network training
+    parser.add_argument('--use_error_map', action='store_true', help='use error map')
+    parser.add_argument('--use_count_map', action='store_true', help='use count map')
+    parser.add_argument('--use_affine', action='store_true',
+                        help='if using additional 2D affine transformation layers for x, y in the invertible network')
+    parser.add_argument('--mask_near', action='store_true',
+                        help='if mask out the nearest samples in the beginning of the optimization,'
+                             'may be helpful to avoid bad initialization associated with wrong surface ordering'
+                             'e.g., a surface is initialized at very small depth but should instead be farther away')
+    parser.add_argument('--num_samples_ray', type=int, default=32, help='number of samples per ray')
+    parser.add_argument('--pe_freq', type=int, default=4, help='the freq for pe used in the affine coupling layers')
+    parser.add_argument('--min_depth', type=float, default=0, help='the minimum depth value')
+    parser.add_argument('--max_depth', type=float, default=2, help='the maximum depth value')
+    parser.add_argument('--start_interval', type=int, default=20, help='the starting interval')
+    parser.add_argument('--max_padding', type=float, default=0,
+                        help='if predicted pixel locs exceed this padding, mask them out for training')
+    # inference
+    parser.add_argument('--chunk_size', type=int, default=1000, help='chunk size for rendering depth and rgb')
+    parser.add_argument('--use_max_loc', action='store_true',
+                        help='during inference, if using only the sample with maximum blending weight on the ray'
+                             'to compute correspondence. If set to False, the correspondences will be computed'
+                             'the same way as training, i.e., compositing all samples along the ray.')
+    parser.add_argument('--query_frame_id', type=int, default=0, help='the id of the query frame')
+    parser.add_argument('--vis_occlusion', action='store_true',
+                        help='if marking occluded pixels as crosses for visualization')
+    parser.add_argument('--occlusion_th', type=float, default=0.99,
+                        help='to determine if a mapped 3d location in the target frame is occluded or not,'
+                             ' we look at the fraction of light absorbed by samples in front of this location '
+                             'on the ray in the target frame (i.e., 1 - transmittance)'
+                             'if that value is higher than this threshold, the mapped point is considered as occluded')
+    parser.add_argument('--foreground_mask_path', type=str, default='',
+                        help='providing the path for foreground mask file for generating trails')
+    # log
+    parser.add_argument('--i_print', type=int, default=100, help='frequency for printing losses')
+    parser.add_argument('--i_img', type=int, default=500, help='frequency for writing visualizations to tensorboard')
+    parser.add_argument('--i_weight', type=int, default=20000, help='frequency for saving ckpts')
+    parser.add_argument('--i_cache', type=int, default=20000, help='frequency for caching current flow predictions')
+    parser.add_argument("-f", "--fff", help="a dummy argument to fool ipython", default="1")
+    args = parser.parse_args()
+    return args
--- a/configs/default.txt
+++ b/configs/default.txt
+expname = DCU-xxx
+data_dir = /your_code_path/omnimotion/DAVIS_data_path/xxx # 指定一个视频序列
+# training
+num_pairs = 8
+num_pts = 256
+use_affine = True
+use_error_map = True
+use_count_map = True
+# inference
+use_max_loc = True
+vis_occlusion = True
\ No newline at end of file
--- a/criterion.py
+++ b/criterion.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import models
+import ssl
+ssl._create_default_https_context = ssl._create_unverified_context
+def cauchy_loss(pred, gt, c=1, mask=None, normalize=True):
+    loss = torch.log(1 + ((pred - gt) / c)**2)
+    if mask is not None:
+        if normalize:
+            return (loss * mask).mean() / (mask.mean() + 1e-8)
+        else:
+            return (loss * mask).mean()
+    else:
+        return loss.mean()
+def masked_mse_loss(pred, gt, mask=None, normalize=True):
+    if mask is None:
+        return F.mse_loss(pred, gt)
+    else:
+        sum_loss = F.mse_loss(pred, gt, reduction='none')
+        ndim = sum_loss.shape[-1]
+        if normalize:
+            return torch.sum(sum_loss * mask) / (ndim * torch.sum(mask) + 1e-8)
+        else:
+            return torch.mean(sum_loss * mask)
+def masked_l1_loss(pred, gt, mask=None, normalize=True, quantile=1):
+    if mask is None:
+        return trimmed_l1_loss(pred, gt, quantile)
+    else:
+        sum_loss = F.l1_loss(pred, gt, reduction='none').mean(dim=-1, keepdim=True)
+        loss_at_quantile = torch.quantile(sum_loss, quantile)
+        quantile_mask = (sum_loss < loss_at_quantile).squeeze(-1)
+        ndim = sum_loss.shape[-1]
+        if normalize:
+            return torch.sum((sum_loss * mask)[quantile_mask]) / (ndim * torch.sum(mask[quantile_mask]) + 1e-8)
+        else:
+            return torch.mean((sum_loss * mask)[quantile_mask])
+def masked_huber_loss(pred, gt, delta, mask=None, normalize=True):
+    if mask is None:
+        return F.huber_loss(pred, gt, delta=delta)
+    else:
+        sum_loss = F.huber_loss(pred, gt, delta=delta, reduction='none')
+        ndim = sum_loss.shape[-1]
+        if normalize:
+            return torch.sum(sum_loss * mask) / (ndim * torch.sum(mask) + 1e-8)
+        else:
+            return torch.mean(sum_loss * mask)
+def trimmed_l1_loss(pred, gt, quantile=0.9):
+    loss = F.l1_loss(pred, gt, reduction='none').mean(dim=-1)
+    loss_at_quantile = torch.quantile(loss, quantile)
+    trimmed_loss = loss[loss < loss_at_quantile].mean()
+    return trimmed_loss
+def trimmed_std_normed_l1_loss(pred, gt, quantile=0.9):
+    loss = F.l1_loss(pred, gt, reduction='none')  # [..., d]
+    mask = loss.mean(dim=-1) < torch.quantile(loss.mean(dim=-1), quantile)  # [...]
+    pred_std = torch.std(pred[mask], dim=0)  # [d]
+    gt_std = torch.std(gt[mask], dim=0)  # [d]
+    std = 0.5 * (pred_std + gt_std)
+    trimmed_std_normed_loss = (loss / std).mean()
+    return trimmed_std_normed_loss
+def trimmed_mse_loss(pred, gt, mask=None, quantile=0.9):
+    loss = F.mse_loss(pred, gt, reduction='none').mean(dim=-1)
+    loss_at_quantile = torch.quantile(loss, quantile)
+    trimmed_loss = loss[loss < loss_at_quantile]
+    if mask is not None:
+        mask = mask[loss < loss_at_quantile]
+        loss = torch.mean(mask * trimmed_loss) / torch.mean(mask)
+    else:
+        loss = torch.mean(trimmed_loss)
+    return loss
+def trimmed_var_normed_mse_loss(pred, gt, quantile=0.9):
+    loss = F.mse_loss(pred, gt, reduction='none')  # [..., d]
+    mask = loss.mean(dim=-1) < torch.quantile(loss.mean(dim=-1), quantile)  # [...]
+    pred_var = torch.var(pred[mask], dim=0)  # [d]
+    gt_var = torch.var(gt[mask], dim=0)  # [d]
+    var = 0.5 * (pred_var + gt_var)
+    trimmed_var_normed_loss = (loss / var).mean()
+    return trimmed_var_normed_loss
+def compute_depth_range_loss(depth, min_th=0, max_th=2):
+    '''
+    the depth of mapped 3d locations should also be within the near and far depth range
+    '''
+    loss_lower = ((depth[depth < min_th] - min_th)**2).sum() / depth.numel()
+    loss_upper = ((depth[depth > max_th] - max_th)**2).sum() / depth.numel()
+    return loss_upper + loss_lower
+def lossfun_distortion(t, w):
+    """Compute iint w[i] w[j] |t[i] - t[j]| di dj."""
+    # The loss incurred between all pairs of intervals.
+    ut = (t[..., 1:] + t[..., :-1]) / 2
+    dut = torch.abs(ut[..., :, None] - ut[..., None, :])
+    loss_inter = torch.sum(w * torch.sum(w[..., None, :] * dut, dim=-1), dim=-1)
+    # The loss incurred within each individual interval with itself.
+    loss_intra = torch.sum(w**2 * (t[..., 1:] - t[..., :-1]), dim=-1) / 3
+    return (loss_inter + loss_intra).mean()
+def median_scale_shift(x):
+    '''
+    :param x: [batch, h, w]
+    :return: median scaled and shifted x
+    '''
+    batch_size = len(x)
+    median_x = torch.median(x.reshape(batch_size, -1), dim=1).values[:, None, None]
+    s_x = torch.mean(torch.abs(x - median_x), dim=(1, 2), keepdim=True)
+    return (x - median_x) / s_x
+def scale_shift_invariant_loss(pred, gt):
+    pred_ = median_scale_shift(pred)
+    gt_ = median_scale_shift(gt)
+    return torch.mean(torch.abs(pred_ - gt_))
+def trimmed_scale_shift_invariant_loss(pred, gt, percentile=0.8):
+    pred_ = median_scale_shift(pred)
+    gt_ = median_scale_shift(gt)
+    error = torch.abs(pred_ - gt_).flatten()
+    cut_value = torch.quantile(error, percentile)
+    return error[error < cut_value].mean()
+class GANLoss(nn.Module):
+    def __init__(self, gan_mode, target_real_label=1.0, target_fake_label=0.0,
+                 tensor=torch.FloatTensor, opt=None):
+        super(GANLoss, self).__init__()
+        self.real_label = target_real_label
+        self.fake_label = target_fake_label
+        self.real_label_tensor = None
+        self.fake_label_tensor = None
+        self.zero_tensor = None
+        self.Tensor = tensor
+        self.gan_mode = gan_mode
+        self.opt = opt
+        if gan_mode == 'ls':
+            pass
+        elif gan_mode == 'original':
+            pass
+        elif gan_mode == 'w':
+            pass
+        elif gan_mode == 'hinge':
+            pass
+        else:
+            raise ValueError('Unexpected gan_mode {}'.format(gan_mode))
+    def get_target_tensor(self, input, target_is_real):
+        if target_is_real:
+            if self.real_label_tensor is None:
+                self.real_label_tensor = self.Tensor(1).fill_(self.real_label)
+                self.real_label_tensor.requires_grad_(False)
+            return self.real_label_tensor.expand_as(input)
+        else:
+            if self.fake_label_tensor is None:
+                self.fake_label_tensor = self.Tensor(1).fill_(self.fake_label)
+                self.fake_label_tensor.requires_grad_(False)
+            return self.fake_label_tensor.expand_as(input)
+    def get_zero_tensor(self, input):
+        if self.zero_tensor is None:
+            self.zero_tensor = self.Tensor(1).fill_(0)
+            self.zero_tensor.requires_grad_(False)
+        return self.zero_tensor.expand_as(input)
+    def loss(self, input, target_is_real, for_discriminator=True):
+        if self.gan_mode == 'original':  # cross entropy loss
+            target_tensor = self.get_target_tensor(input, target_is_real)
+            loss = F.binary_cross_entropy_with_logits(input, target_tensor)
+            return loss
+        elif self.gan_mode == 'ls':
+            target_tensor = self.get_target_tensor(input, target_is_real)
+            return F.mse_loss(input, target_tensor)
+        elif self.gan_mode == 'hinge':
+            if for_discriminator:
+                if target_is_real:
+                    minval = torch.min(input - 1, self.get_zero_tensor(input))
+                    loss = -torch.mean(minval)
+                else:
+                    minval = torch.min(-input - 1, self.get_zero_tensor(input))
+                    loss = -torch.mean(minval)
+            else:
+                assert target_is_real, "The generator's hinge loss must be aiming for real"
+                loss = -torch.mean(input)
+            return loss
+        else:
+            # wgan
+            if target_is_real:
+                return -input.mean()
+            else:
+                return input.mean()
+    def __call__(self, input, target_is_real, for_discriminator=True):
+        # computing loss is a bit complicated because |input| may not be
+        # a tensor, but list of tensors in case of multiscale discriminator
+        if isinstance(input, list):
+            loss = 0
+            for pred_i in input:
+                if isinstance(pred_i, list):
+                    pred_i = pred_i[-1]
+                loss_tensor = self.loss(pred_i, target_is_real, for_discriminator)
+                bs = 1 if len(loss_tensor.size()) == 0 else loss_tensor.size(0)
+                new_loss = torch.mean(loss_tensor.view(bs, -1), dim=1)
+                loss += new_loss
+            return loss / len(input)
+        else:
+            return self.loss(input, target_is_real, for_discriminator)
+class Vgg16(nn.Module):
+    def __init__(self):
+        super(Vgg16, self).__init__()
+        features = models.vgg16(pretrained=True).features
+        self.to_relu_1_2 = nn.Sequential()
+        self.to_relu_2_2 = nn.Sequential()
+        self.to_relu_3_3 = nn.Sequential()
+        self.to_relu_4_3 = nn.Sequential()
+        for x in range(4):
+            self.to_relu_1_2.add_module(str(x), features[x])
+        for x in range(4, 9):
+            self.to_relu_2_2.add_module(str(x), features[x])
+        for x in range(9, 16):
+            self.to_relu_3_3.add_module(str(x), features[x])
+        for x in range(16, 23):
+            self.to_relu_4_3.add_module(str(x), features[x])
+        # don't need the gradients, just want the features
+        for param in self.parameters():
+            param.requires_grad = False
+    def forward(self, x):
+        h = self.to_relu_1_2(x)
+        h_relu_1_2 = h
+        h = self.to_relu_2_2(h)
+        h_relu_2_2 = h
+        h = self.to_relu_3_3(h)
+        h_relu_3_3 = h
+        h = self.to_relu_4_3(h)
+        h_relu_4_3 = h
+        out = [h_relu_1_2, h_relu_2_2, h_relu_3_3, h_relu_4_3]
+        return out
+class Vgg19(nn.Module):
+    def __init__(self, requires_grad=False):
+        super(Vgg19, self).__init__()
+        vgg_pretrained_features = models.vgg19(pretrained=True).features
+        self.slice1 = torch.nn.Sequential()
+        self.slice2 = torch.nn.Sequential()
+        self.slice3 = torch.nn.Sequential()
+        self.slice4 = torch.nn.Sequential()
+        self.slice5 = torch.nn.Sequential()
+        for x in range(2):
+            self.slice1.add_module(str(x), vgg_pretrained_features[x])
+        for x in range(2, 7):
+            self.slice2.add_module(str(x), vgg_pretrained_features[x])
+        for x in range(7, 12):
+            self.slice3.add_module(str(x), vgg_pretrained_features[x])
+        for x in range(12, 21):
+            self.slice4.add_module(str(x), vgg_pretrained_features[x])
+        for x in range(21, 30):
+            self.slice5.add_module(str(x), vgg_pretrained_features[x])
+        if not requires_grad:
+            for param in self.parameters():
+                param.requires_grad = False
+    def forward(self, x):
+        h_relu1 = self.slice1(x)
+        h_relu2 = self.slice2(h_relu1)
+        h_relu3 = self.slice3(h_relu2)
+        h_relu4 = self.slice4(h_relu3)
+        h_relu5 = self.slice5(h_relu4)
+        out = [h_relu1, h_relu2, h_relu3, h_relu4, h_relu5]
+        return out
+class VGGLoss(nn.Module):
+    def __init__(self, model='vgg19', device='cuda'):
+        super().__init__()
+        if model == 'vgg16':
+            self.vgg = Vgg16().to(device)
+            self.weights = [1.0/16, 1.0/8, 1.0/4, 1.0]
+        elif model == 'vgg19':
+            self.vgg = Vgg19().to(device)
+            self.weights = [1.0/32, 1.0/16, 1.0/8, 1.0/4, 1.0]
+            # self.weights = [1/2.6, 1/4.8, 1/3.7, 1/5.6, 10/1.5]
+            # self.weights = [1/2.6, 1/4.8, 1/3.7, 1/5.6, 2/1.5]
+        # self.criterion = nn.L1Loss()
+        self.loss_func = masked_l1_loss
+    @staticmethod
+    def preprocess(x, size=224):
+        # B, C, H, W
+        min_in_size = min(x.shape[-2:])
+        device = x.device
+        mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
+        std = torch.tensor([0.229, 0.224, 0.225]).to(device)
+        x = (x - mean.reshape(1, 3, 1, 1)) / std.reshape(1, 3, 1, 1)
+        # if min_in_size <= size:
+        #     mode = 'bilinear'
+        #     align_corners = True
+        # else:
+        #     mode = 'area'
+        #     align_corners = None
+        # x = F.interpolate(x, size=size, mode=mode, align_corners=align_corners)
+        return x
+    def forward(self, x, y, mask=None, size=224):
+        x = self.preprocess(x, size=size)    # assume x, y are inside (0, 1)
+        y = self.preprocess(y, size=size)
+        if mask is not None:
+            if min(mask.shape[-2:]) <= size:
+                mode = 'bilinear'
+                align_corners = True
+            else:
+                mode = 'area'
+                align_corners = None
+            mask = F.interpolate(mask, size=size, mode=mode, align_corners=align_corners)
+        x_vgg, y_vgg = self.vgg(x), self.vgg(y)
+        # loss = 0
+        loss = self.loss_func(x, y, mask)
+        for i in range(len(x_vgg)):
+            loss += self.weights[i] * self.loss_func(x_vgg[i], y_vgg[i], mask)
+        return loss
+def normalize_minus_one_to_one(x):
+    x_min = x.min()
+    x_max = x.max()
+    return 2. * (x - x_min) / (x_max - x_min) - 1.
+def get_flow_smoothness_loss(flow, alpha):
+    flow_gradient_x = flow[:, :, :, 1:, :] - flow[:, :, :, -1:, :]
+    flow_gradient_y = flow[:, :, :, :, 1:] - flow[:, :, :, :, -1:]
+    cost_x = (alpha[:, :, :, 1:, :] * torch.norm(flow_gradient_x, dim=2, keepdim=True)).sum()
+    cost_y = (alpha[:, :, :, :, 1:] * torch.norm(flow_gradient_y, dim=2, keepdim=True)).sum()
+    avg_cost = (cost_x + cost_y) / (2 * alpha.sum() + 1e-6)
+    return avg_cost
--- a/dataset/Davis.txt
+++ b/dataset/Davis.txt
+数据集davis，参考preprocessing/README.md
\ No newline at end of file
--- a/doc/DCU-dogs-jump_corr_foreground_100000.mp4
+++ b/doc/DCU-dogs-jump_corr_foreground_100000.mp4
--- a/doc/GPU-dogs-jump_corr_foreground_100000.mp4
+++ b/doc/GPU-dogs-jump_corr_foreground_100000.mp4
--- a/doc/loss.png
+++ b/doc/loss.png
--- a/doc/基本原理.png
+++ b/doc/基本原理.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
+ENV DEBIAN_FRONTEND=noninteractive
+# COPY requirements.txt requirements.txt
+# RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
--- a/get_davis.py
+++ b/get_davis.py
+# run: python get_davis.py <OUT_DIR>
+# this file converts the DAVIS dataset into our format.
+import os
+import shutil
+import sys
+import subprocess
+subprocess.run(['wget', 'https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip'])
+subprocess.run(['unzip', 'DAVIS-2017-trainval-480p.zip'])
+img_src_root = 'DAVIS/JPEGImages/480p/'
+seq_names = os.listdir(img_src_root)
+out_dir = sys.argv[1]
+os.makedirs(out_dir, exist_ok=True)
+for seq_name in seq_names:
+    img_src_dir = os.path.join(img_src_root, seq_name)
+    img_dst_dir = os.path.join(out_dir, seq_name, 'color')
+    shutil.copytree(img_src_dir, img_dst_dir)
+    # mask is used only for visualization purposes
+    mask_src_root = 'DAVIS/Annotations/480p/'
+    mask_src_dir = os.path.join(mask_src_root, seq_name)
+    mask_dst_dir = os.path.join(out_dir, seq_name, 'mask')
+    shutil.copytree(mask_src_dir, mask_dst_dir)
+print('DAVIS data is saved to: {}'.format(os.path.abspath(out_dir)))
--- a/loaders/__init__.py
+++ b/loaders/__init__.py
--- a/loaders/__pycache__/__init__.cpython-310.pyc
+++ b/loaders/__pycache__/__init__.cpython-310.pyc
--- a/loaders/__pycache__/__init__.cpython-38.pyc
+++ b/loaders/__pycache__/__init__.cpython-38.pyc
--- a/loaders/__pycache__/create_training_dataset.cpython-310.pyc
+++ b/loaders/__pycache__/create_training_dataset.cpython-310.pyc
--- a/loaders/__pycache__/create_training_dataset.cpython-38.pyc
+++ b/loaders/__pycache__/create_training_dataset.cpython-38.pyc
--- a/loaders/__pycache__/raft.cpython-310.pyc
+++ b/loaders/__pycache__/raft.cpython-310.pyc
--- a/loaders/__pycache__/raft.cpython-38.pyc
+++ b/loaders/__pycache__/raft.cpython-38.pyc