Commit cc6e6b7d authored by Wang, Leping's avatar Wang, Leping
Browse files

- Add config.sh with all pipeline parameters organized by category

  (molecular, crystal structure, compute, run mode, path)
- Refactor search_gen_proc.sh to source config.sh instead of
  hardcoding parameters, with optional config path argument
- Refactor structure_generate.py to load config.sh via exec(),
  replacing hardcoded values with config-driven parameters
- Remove mace-bench (the relaxation part, it will be replaced by updated seperate mace-bench project )
parent 61ec3ad9
# =============================================================================
# Python
# =============================================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# IPython / Jupyter Notebook
.ipynb_checkpoints
*.ipynb_checkpoints/
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
Pipfile.lock
# poetry
poetry.lock
# PEP 582
__pypackages__/
# Celery
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.conda/
conda-env/
# Spyder project settings
.spyderproject.db
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype
.pytype/
# Cython debug symbols
cython_debug/
# =============================================================================
# PyTorch / Deep Learning
# =============================================================================
# Model checkpoints & weights
*.pt
*.pth
*.ckpt
*.bin
*.safetensors
checkpoints/
saved_models/
weights/
pretrained/
# TensorBoard logs
runs/
logs/
tb_logs/
tensorboard_logs/
lightning_logs/
# MLflow
mlruns/
mlflow/
# Weights & Biases
wandb/
# Hydra outputs
outputs/
multirun/
.hydra/
# ONNX models
*.onnx
# TorchScript
*.torchscript
# CUDA profiling
*.nvvp
*.nvprof
*.qdrep
# Data directories (customize as needed)
data/raw/
data/processed/
data/interim/
datasets/
# Large result files
results/
predictions/
evaluations/
# Experiment configs with secrets
secrets.yaml
secrets.yml
.secrets
# =============================================================================
# IDE / Editor
# =============================================================================
# VSCode
.vscode/
*.code-workspace
# PyCharm / JetBrains
.idea/
*.iml
*.iws
*.ipr
# Vim
*.swp
*.swo
*~
# Emacs
\#*\#
.\#*
# Sublime Text
*.sublime-project
*.sublime-workspace
# =============================================================================
# OS
# =============================================================================
# macOS
.DS_Store
.AppleDouble
.LSOverride
._*
.Spotlight-V100
.Trashes
# Windows
Thumbs.db
ehthumbs.db
Desktop.ini
$RECYCLE.BIN/
*.lnk
# Linux
*~
# =============================================================================
# Misc
# =============================================================================
# Compressed files
*.zip
*.tar.gz
*.tar.bz2
*.rar
*.7z
# Temporary files
*.tmp
*.temp
*.bak
*.orig
# Secrets & credentials
.env.local
.env.*.local
*.pem
*.key
config/secrets.*
# =============================================================================
# Project specific
# =============================================================================
csp_results/
\ No newline at end of file
#!/bin/bash
# =============================================================================
# BOMLIP-CSP Configuration File
# =============================================================================
# This file contains all configurable parameters for the crystal structure
# search and generation pipeline. Modify the values below to customize
# your run.
# =============================================================================
# -----------------------------------------------------------------------------
# [Molecular Parameters]
# SMILES string of the input molecule(s).
# Use '.' (dot) to separate multiple molecules for co-crystal generation.
# Example single molecule: "C1CC2=COC=C12"
# Example co-crystal: "C1CC2=COC=C12.CCO"
SMILES="C1CC2=COC=C12"
# Number of conformers to generate during the conformer search step.
# Higher values explore more conformational space but take longer.
# Set to 0 to skip generation and only load existing conformers.
GENERATE_CONFORMERS=10
# Number of conformers to actually use for crystal structure generation.
# Must be <= the number of generated conformers.
# Set to 0 to skip structure generation.
USE_CONFORMERS=4
# Number of molecules in the unit cell (Z').
# Use comma-separated values for multiple molecule types (co-crystal),
# e.g. "1,1" means 1 copy of each molecule in the asymmetric unit.
# Use space-separated values for multiple packings, e.g. "1 2".
MOLECULE_NUM_IN_CELL=1
# -----------------------------------------------------------------------------
# [Crystal Structure Parameters]
# Space group numbers for structure generation.
# Use comma-separated values within a packing, and space-separated
# values for multiple packings.
# Example: "14,61" means search space groups P21/c and Pbca in one packing.
# Example: "14 61" means P21/c in packing 1, Pbca in packing 2.
SPACE_GROUP_LIST="14,61"
# Prefix name added to the output CIF files.
# Use space-separated values for multiple packings.
ADD_NAME="XULDUD"
# Number of crystal structures to generate per (space group, conformer) combination.
# Use space-separated values for multiple packings.
NUM_GENERATION=100
# -----------------------------------------------------------------------------
# [Compute Parameters]
# Maximum number of parallel workers for structure generation.
# Should not exceed the number of available CPU cores.
MAX_WORKERS=16
# -----------------------------------------------------------------------------
# [Run Mode]
# Execution mode controlling which steps are performed.
# Available options:
# all - Run conformer search followed by structure generation (default)
# conformer_only - Only perform conformer search
# structure_only - Skip conformer search, use existing conformers to generate structures
MODE="all"
# -----------------------------------------------------------------------------
# [Path Parameters]
# Directory for storing intermediate conformers and output CIF structures.
# Relative to the project root directory.
OUTPUT_DIR="csp_results"
# Changelog
All notable changes to this project will be documented in this file.
## [0.11.1]
From here, the version of 'main' branch has 'devX' after it diverges from the latest stable version
CLI interface changed in backward-compatible manner. Now `sevenn` has subcommands for
inference, train, etc
### Added
- subcommand with some aliases
- strict e3nn version requirement from __init__.py
### Changed
- pre-commit uses python3.11
- cuequivaraiance optional libraries
- some gitignores
### Fixed
- Circular import in sevenn.checkpoint (dev0)
- Fix typing issues
## [0.11.0]
Multi-fidelity learning implemented & New pretrained-models
### Added
- Build multi-fidelity model, SevenNet-MF, based on given modality in the yaml
- Modality support for sevenn_inference, sevenn_get_modal, and SevenNetCalculator
- sevenn_cp tool for checkpoint summary, input generation, multi-modal routines
- Modality append / assign using sevenn_cp
- Loss weighting for energy, force and stress for corresponding data label
- Ignore unlabelled data when calculating loss. (e.g. stress data for non-pbc structure)
- Dict style dataset input for multi-modal and data-weight
- (experimental) cuEquivariance support
- Downloading large checkpoints from url (7net-MF-ompa, 7net-omat)
- D3 wB97M param
### Changed
- Sort instructions of tensor product in convolution (+ fix flipped w3j coeff of old model)
- Lazy initialization for `IrrepsLinear` and `SelfConnection*`
- Checkpoint things using `sevenn/checkpoint.py`
- e3nn >= 0.5.0, to ensure changed CG coeff later on
- pandas as dependency
- old v1 presets are removed, liquid electrolyte fine-tune yaml is added
### Fixed
- More refactor for shift scale things + few bug fixes
- Correctly shuffle training set when distributed training is enabled
- D3 calculator system swap memory error fixed
- D3 compile uses $HOME/.cache if package directory is not writable
## [0.10.4]
### Added
- feats: D3 calculator
### Fixed
- bug: info dict sharing (therefore energy stress) when structure_list used
- torch >= 2.5.0 works
- numpy >= 2.0 works (need more testing)
### Changed
- sevennet_calculator.py => calculator
- fine tunine preset to use original loss function (Huber) and loss weights
## [0.10.3]
### Added
- SevenNet-l3i5, checkpoint, preset. (keywords: 7net-l3i5, sevennet-l3i5)
- SevenNet-l3i5 test
### Changed
- Now --help do not load unnecessary imports (fast!)
- README
## [0.10.2]
### Added
- Accelerated graph build routine if matscipy is installed @hexagonerose
- matscipy vs. ase neighborlist unit test
- If valid set is not given but data_divide_ratio is given, validaset is created using random split. (shift, scale, and conv_denoiminator uses original whole statistics)
### Changed
- matscipy is included as dependency
- data_divide_ration defaults to 0.0 (not used)
### Fixed
- For torch version >= 2.4.0, Loading graph dataset no more raises warnings.
- Raise error when unknown element is found (SevenNetCalculator)
## [0.10.1]
### Added
- experimental `SevenNetAtomsDataset` which is memory efficient, can be enabled with `dataset_type='atoms'`
- Save meta data & statistics when the `SevenNetGraphDataset` saves its data.
### Changed
- Save checkpoint_0.pth (model before any training)
- `SevenNetGraphDataset._file_to_graph_list` -> `SevenNetGraphDataset.file_to_graph_list`
- Refactoring `SevenNetGraphDataset`, skips computing statistics if it is loaded, more detailed logging
- Prefer use .get when accessing config dict
### Fixed
- Fix error when loading `SevenNetGraphDataset` with other types of data (ex: extxyz) in one dataset
## [0.10.0]
SevenNet now have CI workflows using pytest and its coverage is 78%!
Substantial changes in cli apps and some outputs.
### Added
- [train_v2]: train_v2, with lots of refactoring + support `load_testset_path`. Original routine is accessible: `sevenn -m train_v1`.
- [train_v2]: `SevenNetGraphDataset` replaces old `AtomGrpahDataset`, which extends `InMemoryDataset` of PyG.
- [train_v2]: `sevenn_graph_build` for SevenNetGraphDataset. Previous .sevenn_data is accessible with --legacy option
- [train_v2]: Any number of additional datasets will be evaluated and recorded if it is given as 'load_{NAME}set_path' key (input.yaml).
- 'Univ' keyword for 'chemical_species'
- energy_key, force_key, stress_key options for `sevenn_graph_build`, @thangckt
- OpenMPI distributed training @thangckt
### Changed
- Read EFS of atoms from y_* keys of .info or .arrays dict, instead of caclculator results
- Now `type_map` and requires_grad is hidden inside `AtomGraphSequential`, and don't need to care about it.
- `log.sevenn` and `lc.csv` automatically find a safe filename (log0.sevenn, log1.sevenn, ...) to avoid overwriting.
- [train_v2]: train_v2 loads its training set via `load_trainset_path`, rather than previous `load_dataset_path`.
- [train_v2]: log.csv -> lc.csv, and columns have no units, (easier to postprocess with it) but still on `log.sevenn`.
### Fixed
- [e3gnn_serial]: can continue simulation even when atom tag becomes not consecutive (removing atom dynamically), @gasplant64
- [e3gnn_parallel]: undefined behavior when there is no atoms to send/recv (for non pbc system)
- [e3gnn_parallel]: incorrect force/stress in some edge cases (too small simulation cell & 2 process)
- [e3gnn_parallel]: revert commit 14851ef, now e3gnn_parallel is sane.
- [e3gnn_*]: += instead of = when saving virial stress and forces @gasplant64
- Now Logger correctly closes a file.
- ... and lots of small bugs I found during writing `pytest`.
## [0.9.5]
### Note
This version is not stable, but I tag it as v0.9.5 before making further changes.
LAMMPS `pair_e3gnn_parallel.*` should be re-compiled for the below changes regarding LAMMPS parallel.
This is the first changelog and may not reflect all the changes.
### Added
- Stress compute for LAMMPS sevennet parallel
- `sevenn_inference` now takes .extxyz input
- `sevenn_inference` gives MAE error
- Experimental `sevenn_inference` on the fly graph build option
### Changed
- **[Breaking]** Parallel LAMMPS model changed, old deployed parallel models will not work
- **[Breaking]** Parallel LAMMPS takes the directory of potentials as input. Accordingly, `sevenn_get_model -p` creates a folder with potentials.
- **[Breaking]** Except for serial LAMMPS models, force and stress are computed from gradients of edge vectors, not positions.
- Separate interaction block from model build
- Add typing for most of functions
- Remove clang pre-commit hook as it breaks lammps pair files
- `torch.load` with `weights_only=False`
- Line length limit 80 -> 85
- Refactor
### Fixed
- Correct batch size for SevenNet-0(11July2024)
## [0.9.4] - 2024-08-26
### Added
- D3 correction (contributed from dambi3613) for LAMMPS serial
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment