"tools/vscode:/vscode.git/clone" did not exist on "3a5a2010f01d9ad71fcfc9a8456de502dd66dc4e"
Commit 524a1b6e authored by mashun's avatar mashun
Browse files

particle

parents
Pipeline #1943 failed with stages
in 0 seconds
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
datasets/*
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk24.04.3-py3.10
\ No newline at end of file
MIT License
Copyright (c) 2022 Huilin Qu, Congqiao Li, Sitian Qian
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# particle_transformer
## 论文
`Particle Transformer for Jet Tagging`
* https://arxiv.org/abs/2202.03772
## 模型结构
该模型是一种基于 Transformer 的架构,通过成对粒子交互功能进行了增强,这些功能作为softmax之前的偏差合并到多头注意力中。
<img src="readme_imgs/arch.png" style="zoom:70%">
## 算法原理
该算法利用 Transformer 架构处理粒子云数据。Transformer 通过自注意力机制捕捉粒子之间的复杂关系,并学习粒子的全局特征。
<img src="readme_imgs/alg.png" style="zoom:100%">
## 环境配置
### Docker(方法一)
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk24.04.3-py3.10
docker run --shm-size 50g --network=host --name=pt --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
pip install -r requirements.txt
### Dockerfile(方法二)
docker build -t <IMAGE_NAME>:<TAG> .
docker run --shm-size 50g --network=host --name=pt --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
pip install -r requirements.txt
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
DTK驱动:dtk24.04.3
python:python3.10
torch: 2.1.0
torchvision: 0.16.0
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
2、其它非特殊库参照requirements.txt安装
pip install -r requirements.txt
## 数据集
可使用脚本下载数据集,也可通过SCNet高速通道进行下载,
```bash
./get_datasets.py [JetClass|QuarkGluon|TopLandscape] [-d DATA_DIR]
```
SCNet高速下载通道:
[JetClass](http://113.200.138.88:18080/aidatasets/project-dependency/jetclass)
| [QuarkGluon](http://113.200.138.88:18080/aidatasets/project-dependency/quarkgluon)
| [TopLandscape](http://113.200.138.88:18080/aidatasets/project-dependency/toplandscape)
```
datasets/
├── JetClass
│   └── Pythia
│   ├── test_20M
│   ├── train_100M
│   └── val_5M
├── QuarkGluon
│   ├── test_file_*.parquet
│   └── train_file_*.parquet
└── TopLandscape
   ├── test_file.parquet
  ├── train_file.parquet
   └── val_file.parquet
```
## 训练
```bash
pip install 'weaver-core>=0.4'
```
```bash
# 在JetClass数据集上训练
./train_JetClass.sh [ParT|PN|PFN|PCNN] [kin|kinpid|full] ...
# 在QuarkGluon数据集上训练
./train_QuarkGluon.sh [ParT|ParT-FineTune|PN|PN-FineTune|PFN|PCNN] [kin|kinpid|kinpidplus] ...
# 在TopLandscape数据集上训练
./train_TopLandscape.sh [ParT|ParT-FineTune|PN|PN-FineTune|PFN|PCNN] [kin] ...
```
其中第一个参数表示使用的网络,分别是
- ParT: Particle Transformer
- PN: ParticleNet
- PFN: Particle Flow Network
- PCNN: P-CNN
- xxx-FineTune: 使用预训练模型微调
第二个参数表示输入的特征集合,分别是
- kin: only kinematic inputs
- kinpid: kinematic inputs + particle identification
- full: kinematic inputs + particle identification + trajectory displacement
### 多卡训练
```bash
# DP - pytorch
./train_JetClass.sh ParT full --gpus 0,1,2,3 --batch-size [total_batch_size] ...
# DDP - pytorch
DDP_NGPUS=4 ./train_JetClass.sh ParT full --batch-size [batch_size_per_gpu] ...
```
## 推理
```bash
bash test_QuarkGluon_demo.sh
```
注意:该推理脚本仅供参考,具体见[weaver-core](https://github.com/hqucms/weaver-core)
## result
![alt text](readme_imgs/result.png)
### 精度
所有模型均选择ParT,以下结果仅展示不同加速卡在相同配置下的训练精度差异,并不代表可得到的最高精度。
仅记录AvgAcc.
|加速卡|JetClass|QuarkGluon(kinpidplus)|TopLandscape(kin)|
|:---:|:---:|:---:|:---:|
|K100_AI|0.6219|0.8495|0.93975|
|GPU|0.620|0.84921|0.93987|
## 应用场景
### 算法类别
`ai for science`
### 热点应用行业
`高能物理,医疗,金融`
## 源码仓库及问题反馈
* https://developer.sourcefind.cn/codes/modelzoo/particle_transformer_pytorch
## 参考资料
* https://github.com/jet-universe/particle_transformer
* https://github.com/hqucms/weaver-core
\ No newline at end of file
# Particle Transformer
This repo is the official implementation of "[Particle Transformer for Jet Tagging](https://arxiv.org/abs/2202.03772)". It includes the code, pre-trained models, and the JetClass dataset.
![jet-tagging](figures/jet-tagging.png)
## Updates
### 2023/07/06
We added a [helper function](dataloader.py) to read the JetClass dataset into regular numpy arrays. To use it, simply download the file [dataloader.py](dataloader.py) and do:
```python
from dataloader import read_file
x_particles, x_jet, y = read_file(filepath)
```
The return values are:
- `x_particles`: a zero-padded numpy array of particle-level features in the shape `(num_jets, num_particle_features, max_num_particles)`.
- `x_jets`: a numpy array of jet-level features in the shape `(num_jets, num_jet_features)`.
- `y`: a one-hot encoded numpy array of the truth lables in the shape `(num_jets, num_classes)`.
## Introduction
### JetClass dataset
**[JetClass](https://zenodo.org/record/6619768)** is a new large-scale jet tagging dataset proposed in "[Particle Transformer for Jet Tagging](https://arxiv.org/abs/2202.03772)". It consists of 100M jets for training, 5M for validation and 20M for testing. The dataset contains 10 classes of jets, simulated with [MadGraph](https://launchpad.net/mg5amcnlo) + [Pythia](https://pythia.org/) + [Delphes](https://cp3.irmp.ucl.ac.be/projects/delphes):
![dataset](figures/dataset.png)
### Particle Transformer (ParT)
The **Particle Transformer (ParT)** architecture is described in "[Particle Transformer for Jet Tagging](https://arxiv.org/abs/2202.03772)", which can serve as a general-purpose backbone for jet tagging and similar tasks in particle physics. It is a Transformer-based architecture, enhanced with pairwise particle interaction features that are incorporated in the multi-head attention as a bias before softmax. The ParT architecture outperforms the previous state-of-the-art, ParticleNet, by a large margin on various jet tagging benchmarks.
![arch](figures/arch.png)
## Getting started
### Download the datasets
To download the JetClass/QuarkGluon/TopLandscape datasets:
```
./get_datasets.py [JetClass|QuarkGluon|TopLandscape] [-d DATA_DIR]
```
After download, the dataset paths will be updated in the `env.sh` file.
### Training
The ParT models are implemented in PyTorch and the training is based on the [weaver](https://github.com/hqucms/weaver-core) framework for dataset loading and transformation. To install `weaver`, run:
```python
pip install 'weaver-core>=0.4'
```
**To run the training on the JetClass dataset:**
```
./train_JetClass.sh [ParT|PN|PFN|PCNN] [kin|kinpid|full] ...
```
where the first argument is the model:
- ParT: [Particle Transformer](https://arxiv.org/abs/2202.03772)
- PN: [ParticleNet](https://arxiv.org/abs/1902.08570)
- PFN: [Particle Flow Network](https://arxiv.org/abs/1810.05165)
- PCNN: [P-CNN](https://arxiv.org/abs/1902.09914)
and the second argument is the input feature sets:
- [kin](data/JetClass/JetClass_kin.yaml): only kinematic inputs
- [kinpid](data/JetClass/JetClass_kinpid.yaml): kinematic inputs + particle identification
- [full](data/JetClass/JetClass_full.yaml) (_default_): kinematic inputs + particle identification + trajectory displacement
Additional arguments will be passed directly to the `weaver` command, such as `--batch-size`, `--start-lr`, `--gpus`, etc., and will override existing arguments in `train_JetClass.sh`.
**Multi-gpu support:**
- using PyTorch's DataParallel multi-gpu training:
```
./train_JetClass.sh ParT full --gpus 0,1,2,3 --batch-size [total_batch_size] ...
```
- using PyTorch's DistributedDataParallel:
```
DDP_NGPUS=4 ./train_JetClass.sh ParT full --batch-size [batch_size_per_gpu] ...
```
**To run the training on the QuarkGluon dataset:**
```
./train_QuarkGluon.sh [ParT|ParT-FineTune|PN|PN-FineTune|PFN|PCNN] [kin|kinpid|kinpidplus] ...
```
**To run the training on the TopLandscape dataset:**
```
./train_TopLandscape.sh [ParT|ParT-FineTune|PN|PN-FineTune|PFN|PCNN] [kin] ...
```
The argument `ParT-FineTune` or `PN-FineTune` will run the fine-tuning using [models pre-trained on the JetClass dataset](models/).
## Citations
If you use the Particle Transformer code and/or the JetClass dataset, please cite:
```
@InProceedings{Qu:2022mxj,
author = "Qu, Huilin and Li, Congqiao and Qian, Sitian",
title = "{Particle Transformer} for Jet Tagging",
booktitle = "{Proceedings of the 39th International Conference on Machine Learning}",
pages = "18281--18292",
year = "2022",
eprint = "2202.03772",
archivePrefix = "arXiv",
primaryClass = "hep-ph"
}
@dataset{JetClass,
author = "Qu, Huilin and Li, Congqiao and Qian, Sitian",
title = "{JetClass}: A Large-Scale Dataset for Deep Learning in Jet Physics",
month = "jun",
year = "2022",
publisher = "Zenodo",
version = "1.0.0",
doi = "10.5281/zenodo.6619768",
url = "https://doi.org/10.5281/zenodo.6619768"
}
```
Additionally, if you use the ParticleNet model, please cite:
```
@article{Qu:2019gqs,
author = "Qu, Huilin and Gouskos, Loukas",
title = "{ParticleNet: Jet Tagging via Particle Clouds}",
eprint = "1902.08570",
archivePrefix = "arXiv",
primaryClass = "hep-ph",
doi = "10.1103/PhysRevD.101.056019",
journal = "Phys. Rev. D",
volume = "101",
number = "5",
pages = "056019",
year = "2020"
}
```
For the QuarkGluon dataset, please cite:
```
@article{Komiske:2018cqr,
author = "Komiske, Patrick T. and Metodiev, Eric M. and Thaler, Jesse",
title = "{Energy Flow Networks: Deep Sets for Particle Jets}",
eprint = "1810.05165",
archivePrefix = "arXiv",
primaryClass = "hep-ph",
reportNumber = "MIT-CTP 5064",
doi = "10.1007/JHEP01(2019)121",
journal = "JHEP",
volume = "01",
pages = "121",
year = "2019"
}
@dataset{komiske_patrick_2019_3164691,
author = {Komiske, Patrick and
Metodiev, Eric and
Thaler, Jesse},
title = {Pythia8 Quark and Gluon Jets for Energy Flow},
month = may,
year = 2019,
publisher = {Zenodo},
version = {v1},
doi = {10.5281/zenodo.3164691},
url = {https://doi.org/10.5281/zenodo.3164691}
}
```
For the TopLandscape dataset, please cite:
```
@article{Kasieczka:2019dbj,
author = "Butter, Anja and others",
editor = "Kasieczka, Gregor and Plehn, Tilman",
title = "{The Machine Learning landscape of top taggers}",
eprint = "1902.09914",
archivePrefix = "arXiv",
primaryClass = "hep-ph",
doi = "10.21468/SciPostPhys.7.1.014",
journal = "SciPost Phys.",
volume = "7",
pages = "014",
year = "2019"
}
@dataset{kasieczka_gregor_2019_2603256,
author = {Kasieczka, Gregor and
Plehn, Tilman and
Thompson, Jennifer and
Russel, Michael},
title = {Top Quark Tagging Reference Dataset},
month = mar,
year = 2019,
publisher = {Zenodo},
version = {v0 (2018\_03\_27)},
doi = {10.5281/zenodo.2603256},
url = {https://doi.org/10.5281/zenodo.2603256}
}
```
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_energy)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
part_d0: np.tanh(part_d0val)
part_dz: np.tanh(part_dzval)
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_charge, null]
- [part_isChargedHadron, null]
- [part_isNeutralHadron, null]
- [part_isPhoton, null]
- [part_isElectron, null]
- [part_isMuon, null]
- [part_d0, null]
- [part_d0err, 0, 1, 0, 1]
- [part_dz, null]
- [part_dzerr, 0, 1, 0, 1]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [label_QCD, label_Hbb, label_Hcc, label_Hgg, label_H4q, label_Hqql, label_Zqq, label_Wqq, label_Tbqq, label_Tbl]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
- jet_phi
- jet_energy
- jet_nparticles
- jet_sdmass
- jet_tau1
- jet_tau2
- jet_tau3
- jet_tau4
weights:
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_energy)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [label_QCD, label_Hbb, label_Hcc, label_Hgg, label_H4q, label_Hqql, label_Zqq, label_Wqq, label_Tbqq, label_Tbl]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
- jet_phi
- jet_energy
- jet_nparticles
- jet_sdmass
- jet_tau1
- jet_tau2
- jet_tau3
- jet_tau4
weights:
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_energy)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_charge, null]
- [part_isChargedHadron, null]
- [part_isNeutralHadron, null]
- [part_isPhoton, null]
- [part_isElectron, null]
- [part_isMuon, null]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [label_QCD, label_Hbb, label_Hcc, label_Hgg, label_H4q, label_Hqql, label_Zqq, label_Wqq, label_Tbqq, label_Tbl]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
- jet_phi
- jet_energy
- jet_nparticles
- jet_sdmass
- jet_tau1
- jet_tau2
- jet_tau3
- jet_tau4
weights:
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_deta)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
jet_isQ: label
jet_isG: 1-label
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [jet_isQ, jet_isG]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
weights:
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_deta)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
jet_isQ: label
jet_isG: 1-label
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_charge, null]
- [part_isChargedHadron, null]
- [part_isNeutralHadron, null]
- [part_isPhoton, null]
- [part_isElectron, null]
- [part_isMuon, null]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [jet_isQ, jet_isG]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
weights:
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_deta)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
part_isCHad: (np.abs(part_pid)==211) + (np.abs(part_pid)==321)*0.5 + (np.abs(part_pid)==2212)*0.2
part_isNHad: (np.abs(part_pid)==130) + (np.abs(part_pid)==2112)*0.2
jet_isQ: label
jet_isG: 1-label
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_charge, null]
- [part_isCHad, null]
- [part_isNHad, null]
- [part_isPhoton, null]
- [part_isElectron, null]
- [part_isMuon, null]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [jet_isQ, jet_isG]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
weights:
selection:
### use `&`, `|`, `~` for logical operations on numpy arrays
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
new_variables:
### [format] name: formula
### can use functions from `math`, `np` (numpy), and `awkward` in the expression
part_mask: ak.ones_like(part_deta)
part_pt: np.hypot(part_px, part_py)
part_pt_log: np.log(part_pt)
part_e_log: np.log(part_energy)
part_logptrel: np.log(part_pt/jet_pt)
part_logerel: np.log(part_energy/jet_energy)
part_deltaR: np.hypot(part_deta, part_dphi)
jet_isTop: label
jet_isQCD: 1-label
preprocess:
### method: [manual, auto] - whether to use manually specified parameters for variable standardization
method: manual
### data_fraction: fraction of events to use when calculating the mean/scale for the standardization
data_fraction: 0.5
inputs:
pf_points:
length: 128
pad_mode: wrap
vars:
- [part_deta, null]
- [part_dphi, null]
pf_features:
length: 128
pad_mode: wrap
vars:
### [format 1]: var_name (no transformation)
### [format 2]: [var_name,
### subtract_by(optional, default=None, no transf. if preprocess.method=manual, auto transf. if preprocess.method=auto),
### multiply_by(optional, default=1),
### clip_min(optional, default=-5),
### clip_max(optional, default=5),
### pad_value(optional, default=0)]
- [part_pt_log, 1.7, 0.7]
- [part_e_log, 2.0, 0.7]
- [part_logptrel, -4.7, 0.7]
- [part_logerel, -4.7, 0.7]
- [part_deltaR, 0.2, 4.0]
- [part_deta, null]
- [part_dphi, null]
pf_vectors:
length: 128
pad_mode: wrap
vars:
- [part_px, null]
- [part_py, null]
- [part_pz, null]
- [part_energy, null]
pf_mask:
length: 128
pad_mode: constant
vars:
- [part_mask, null]
labels:
### type can be `simple`, `custom`
### [option 1] use `simple` for binary/multi-class classification, then `value` is a list of 0-1 labels
type: simple
value: [jet_isTop, jet_isQCD]
### [option 2] otherwise use `custom` to define the label, then `value` is a map
# type: custom
# value:
# truth_label: label.argmax(1)
observers:
- jet_pt
- jet_eta
weights:
import numpy as np
import awkward as ak
import uproot
import vector
vector.register_awkward()
def read_file(
filepath,
max_num_particles=128,
particle_features=['part_pt', 'part_eta', 'part_phi', 'part_energy'],
jet_features=['jet_pt', 'jet_eta', 'jet_phi', 'jet_energy'],
labels=['label_QCD', 'label_Hbb', 'label_Hcc', 'label_Hgg', 'label_H4q',
'label_Hqql', 'label_Zqq', 'label_Wqq', 'label_Tbqq', 'label_Tbl']):
"""Loads a single file from the JetClass dataset.
**Arguments**
- **filepath** : _str_
- Path to the ROOT data file.
- **max_num_particles** : _int_
- The maximum number of particles to load for each jet.
Jets with fewer particles will be zero-padded,
and jets with more particles will be truncated.
- **particle_features** : _List[str]_
- A list of particle-level features to be loaded.
The available particle-level features are:
- part_px
- part_py
- part_pz
- part_energy
- part_pt
- part_eta
- part_phi
- part_deta: np.where(jet_eta>0, part_eta-jet_p4, -(part_eta-jet_p4))
- part_dphi: delta_phi(part_phi, jet_phi)
- part_d0val
- part_d0err
- part_dzval
- part_dzerr
- part_charge
- part_isChargedHadron
- part_isNeutralHadron
- part_isPhoton
- part_isElectron
- part_isMuon
- **jet_features** : _List[str]_
- A list of jet-level features to be loaded.
The available jet-level features are:
- jet_pt
- jet_eta
- jet_phi
- jet_energy
- jet_nparticles
- jet_sdmass
- jet_tau1
- jet_tau2
- jet_tau3
- jet_tau4
- **labels** : _List[str]_
- A list of truth labels to be loaded.
The available label names are:
- label_QCD
- label_Hbb
- label_Hcc
- label_Hgg
- label_H4q
- label_Hqql
- label_Zqq
- label_Wqq
- label_Tbqq
- label_Tbl
**Returns**
- x_particles(_3-d numpy.ndarray_), x_jets(_2-d numpy.ndarray_), y(_2-d numpy.ndarray_)
- `x_particles`: a zero-padded numpy array of particle-level features
in the shape `(num_jets, num_particle_features, max_num_particles)`.
- `x_jets`: a numpy array of jet-level features
in the shape `(num_jets, num_jet_features)`.
- `y`: a one-hot encoded numpy array of the truth lables
in the shape `(num_jets, num_classes)`.
"""
def _pad(a, maxlen, value=0, dtype='float32'):
if isinstance(a, np.ndarray) and a.ndim >= 2 and a.shape[1] == maxlen:
return a
elif isinstance(a, ak.Array):
if a.ndim == 1:
a = ak.unflatten(a, 1)
a = ak.fill_none(ak.pad_none(a, maxlen, clip=True), value)
return ak.values_astype(a, dtype)
else:
x = (np.ones((len(a), maxlen)) * value).astype(dtype)
for idx, s in enumerate(a):
if not len(s):
continue
trunc = s[:maxlen].astype(dtype)
x[idx, :len(trunc)] = trunc
return x
table = uproot.open(filepath)['tree'].arrays()
p4 = vector.zip({'px': table['part_px'],
'py': table['part_py'],
'pz': table['part_pz'],
'energy': table['part_energy']})
table['part_pt'] = p4.pt
table['part_eta'] = p4.eta
table['part_phi'] = p4.phi
x_particles = np.stack([ak.to_numpy(_pad(table[n], maxlen=max_num_particles)) for n in particle_features], axis=1)
x_jets = np.stack([ak.to_numpy(table[n]).astype('float32') for n in jet_features], axis=1)
y = np.stack([ak.to_numpy(table[n]).astype('int') for n in labels], axis=1)
return x_particles, x_jets, y
#!/bin/bash
export DATADIR_JetClass=
export DATADIR_TopLandscape=
export DATADIR_QuarkGluon=
#!/usr/bin/env python3
import argparse
import os
import shutil
from utils.dataset_utils import get_file, extract_archive
datasets = {
'JetClass': {
'Pythia/train_100M': [
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part0.tar', 'de4fd2dca2e68ab3c85d5cfd3bcc65c3'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part1.tar', '9722a359c5ef697bea0fbf79bf50f003'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part2.tar', '1e9f66cd1f915f9d10e90ae1d7761720'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part3.tar', '47348fc8985319fa4806da87500482fa'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part4.tar', '6b0ce16bd93b442a8d51914466990279'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part5.tar', '416e347512e716de51d392bee327b8e9'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part6.tar', 'e9b9c1557b1b39bf0a16e4ab631ae451'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part7.tar', '5bfc6cb285ccb7680cefa9ac82ad1a2e'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part8.tar', '540c1a0d66dfad78d2b363c5740ccf86'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_train_100M_part9.tar', '668f40b3275167ff7104c48317c0ae2a'),
],
'Pythia/': [
('https://zenodo.org/record/6619768/files/JetClass_Pythia_val_5M.tar', '7235ccb577ed85023ea3ab4d5e6160cf'),
('https://zenodo.org/record/6619768/files/JetClass_Pythia_test_20M.tar', '64e5156d26d101adeb43b8388207d767'),
],
},
'TopLandscape': {
# converted from https://zenodo.org/record/2603256
'../': [
('https://hqu.web.cern.ch/datasets/TopLandscape/TopLandscape.tar', '4fca2e47afbf321b0f201da6b804c404'),
],
},
'QuarkGluon': {
# converted from https://zenodo.org/record/3164691
'../': [
('https://hqu.web.cern.ch/datasets/QuarkGluon/QuarkGluon.tar', 'd8dd7f71a7aaaf9f1d2ee3cddef998f9'),
],
},
}
def download_dataset(dataset, basedir, envfile, force_download):
info = datasets[dataset]
datadir = os.path.join(basedir, dataset)
if force_download:
if os.path.exists(datadir):
print(f'Removing existing dir {datadir}')
shutil.rmtree(datadir)
for subdir, flist in info.items():
for url, md5 in flist:
fpath, download = get_file(url, datadir=datadir, file_hash=md5, force_download=force_download)
if download:
extract_archive(fpath, path=os.path.join(datadir, subdir))
datapath = f'DATADIR_{dataset}={datadir}'
with open(envfile) as f:
lines = f.readlines()
with open(envfile, 'w') as f:
for l in lines:
if f'DATADIR_{dataset}' in l:
l = f'export {datapath}\n'
f.write(l)
print(f'Updated dataset path in {envfile} to "{datapath}".')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('dataset', choices=datasets.keys(), help='datasets to download')
parser.add_argument('-d', '--basedir', default='datasets', help='base directory for the datasets')
parser.add_argument('-e', '--envfile', default='env.sh', help='env file with the dataset paths')
parser.add_argument('-f', '--force', action='store_true', help='force to re-download dataset')
args = parser.parse_args()
download_dataset(args.dataset, args.basedir, args.envfile, args.force)
icon.png

79 KB

# 模型唯一标识
modelCode=1096
# 模型名称
modelName=particle_transformer_pytorch
# 模型描述
modelDescription=基于深度学习的喷注标签
# 应用场景
appScenario=训练,推理,ai for science,高能物理,医疗,金融
# 框架类型
frameType=pytorch
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment