Unverified Commit 753da9ff authored by zcxzcx1's avatar zcxzcx1 Committed by GitHub
Browse files

Merge pull request #2 from hjhk258/main

add comments for more clear
parents 273418ee 2603f510
......@@ -4,78 +4,70 @@ An open-source Python framework that integrates machine learning interatomic
potentials (MLIPs) with a tailored batched optimization strategy, enabling rapid,
unbiased structure prediction across the full density range
## Perform a complete CSP process
## Install the BOMLIP-CSP
```sh
git clone https://github.com/pic-ai-robotic-chemistry/BOMLIP-CSP.git --recursive && cd BOMLIP-CSP
conda create -n BOMLIP_CSP python=3.10 -y && conda activate BOMLIP_CSP
git clone https://github.com/pic-ai-robotic-chemistry/BOMLIP-CSP.git --recursive
cd BOMLIP-CSP
conda create -n BOMLIP_CSP python=3.10 -y
conda activate BOMLIP_CSP
cd BOMLIP-CSP/mace-bench
./reproduce/init_mace.sh && source util/env.sh
./reproduce/init_mace.sh
source util/env.sh
cd ..
```
## Perform a complete CSP process
Starting the exclusive mode to further accelerate the process if you have administrator privileges.
```sh
sudo ./util/mps_start.sh
```
cd ..
The main program of the CSP process.
BATCHED GEOMETRY OPTIMIZATION REQUIRES GPU USAGE!
PLEASE CONFIRM THE GPU AND WORKER SETTINGS IN THE SHELL SCRIPT BEFORE RUNNING THIS COMMAND!
which includes --gpu_offset --n_gpus --num_workers.
```sh
./csp.sh
```
End the exclusive mode after running.
```sh
sudo ./util/mps_clean.sh
```
## Perform conformer search / structure generation / structure optimization separately
### conformer search
In csp.sh, the argument --mode controls the jobs to do.
Use conformer_only to perform conformer search task only.
```sh
python "${TOP_DIR}/main.py" --path ${TAR_DIR} --smiles "OC(=O)c1cc(O)c(O)c(O)c1.O" \
--molecule_num_in_cell 1,1 --space_group_list 13,14 --add_name KONTIQ --max_workers 16\
--num_generation 100 --generate_conformers 20 --use_conformers 4 --mode conformer_only > generate.log 2>&1
python "${TOP_DIR}/main.py" --path ${TAR_DIR} --smiles "C1CC2=COC=C12" \
--num_generation 100 --generate_conformers 10 --mode conformer_only > generate_conformer.log 2>&1
```
### structure generation
Or use structure_only to perform structure generation only.
In this mode, conformers (generated by this program or provided by yourself as .xyz file from other methods) should be provided in folder ${TAR_DIR}/molecule_${i}/conformers as conformer_${j}.xyz files
where i start from 1 and j start from 0.
```sh
python "${TOP_DIR}/main.py" --path ${TAR_DIR} --smiles "OC(=O)c1cc(O)c(O)c(O)c1.O" \
--molecule_num_in_cell 1,1 --space_group_list 13,14 --add_name KONTIQ --max_workers 16\
--num_generation 100 --generate_conformers 20 --use_conformers 4 --mode structure_only > generate.log 2>&1
python "${TOP_DIR}/main.py" --path ${TAR_DIR} --molecule_num_in_cell 1 \
--space_group_list 14,61 --add_name XULDUD --max_workers 16 --num_generation 100 \
--use_conformers 4 --mode structure_only > generate_structure.log 2>&1
```
Conformer search and structure generation could also be done in python script with higher freedom (e.g. higher Z', higher-order co-crystal or control trail structure number for each space group), see structure_generate.py.
### structure optimization
Structure optimization is done by a seperate command
```sh
python "${TOP_DIR}/mace-bench/scripts/mace_opt_batch.py" ...
python "${TOP_DIR}/mace-bench/scripts/opt_batch.py" ...
```
Change this command into a comment if you don't want to do that.
## Reproduce mace batch opt speedup.
```sh
#!/bin/bash
git clone https://github.com/pic-ai-robotic-chemistry/BOMLIP-CSP.git --recursive && cd BOMLIP-CSP
conda create -n BOMLIP_CSP python=3.10 -y && conda activate BOMLIP_CSP
cd BOMLIP-CSP/mace-bench
Explanations for all arguments are provided in main.py and mace-bench/scripts/opt_batch.py.
# initialize mace env.
./reproduce/init_mace.sh && source util/env.sh
sudo ./util/mps_start.sh
cd reproduce
# run baseline sub-test
./subtest_baseline.sh
# run baseline mixed test
cd perf_v2_base
./run_mace.sh
# run BOMLIP_CSP sub-test
cd ../
./subtest.sh
# run BOMLIP_CSP mixed test
cd perf_v2_batch
./opt.sh
# clean mps
./util/mps_clean.sh
```
## If you want to configure the 7net environment.
### If you want to configure the 7net environment.
```sh
#!/bin/bash
......@@ -83,7 +75,7 @@ conda create -n 7net-cueq python=3.10 -y && conda activate 7net-cueq
./reproduce/init_7net.sh && source util/env.sh
# Use a fixed batch size for structural optimization
python ../../scripts/mace_opt_batch.py --target_folder "../../data/perf_v2" \
python ../../scripts/opt_batch.py --target_folder "../../data/perf_v2" \
--molecule_single 46 --gpu_offset 0 --n_gpus 4 --num_workers 4 \
--batch_size 2 --max_steps 3000 --filter1 UnitCellFilter \
--filter2 UnitCellFilter --optimizer1 BFGSFusedLS --optimizer2 BFGS \
......
......@@ -7,15 +7,22 @@ cd ${TAR_DIR}
# conformer search and structure generation
# change --mode to conformer_only or structure_only to seperate the process.
python "${TOP_DIR}/main.py" --path ${TAR_DIR} --smiles "OC(=O)c1cc(O)c(O)c(O)c1.O" \
--molecule_num_in_cell 1,1 --space_group_list 13,14 --add_name KONTIQ --max_workers 16\
--num_generation 100 --generate_conformers 20 --use_conformers 4 --mode all > generate.log 2>&1
python "${TOP_DIR}/main.py" --path ${TAR_DIR} --smiles "C1CC2=COC=C12" \
--molecule_num_in_cell 1 --space_group_list 14,61 --add_name XULDUD --max_workers 16 \
--num_generation 100 --generate_conformers 10 --use_conformers 4 --mode all > generate.log 2>&1
# python "${TOP_DIR}/main.py" --path ${TAR_DIR} --smiles "C1CC2=COC=C12" \
# --num_generation 100 --generate_conformers 10 --mode conformer_only > generate_conformer.log 2>&1
# python "${TOP_DIR}/main.py" --path ${TAR_DIR} --molecule_num_in_cell 1 \
# --space_group_list 14,61 --add_name XULDUD --max_workers 16 --num_generation 100 \
# --use_conformers 4 --mode structure_only > generate_structure.log 2>&1
# opt structures using mace, --batch_size 0 means auto batch size only for mace
mkdir -p "${TAR_DIR}/mace_opt"
cd "${TAR_DIR}/mace_opt"
python "${TOP_DIR}/mace-bench/scripts/mace_opt_batch.py" --target_folder "${TAR_DIR}/structures" \
--molecule_single 21 --gpu_offset 0 --n_gpus 8 --num_workers 80 --batch_size 0 \
python "${TOP_DIR}/mace-bench/scripts/opt_batch.py" --target_folder "${TAR_DIR}/structures" \
--molecule_single 13 --gpu_offset 0 --n_gpus 8 --num_workers 80 --batch_size 0 \
--max_steps 3000 --filter1 UnitCellFilter --filter2 UnitCellFilter \
--optimizer1 BFGSFusedLS --optimizer2 BFGS --num_threads 1 --cueq true \
--use_ordered_files true --model mace > opt.log 2>&1
......@@ -23,8 +30,8 @@ python "${TOP_DIR}/mace-bench/scripts/mace_opt_batch.py" --target_folder "${TAR_
# opt structures using 7net
# mkdir -p "${TAR_DIR}/7net_opt"
# cd "${TAR_DIR}/7net_opt"
# python "${TOP_DIR}/mace-bench/scripts/mace_opt_batch.py" --target_folder "${TAR_DIR}/structures" \
# --molecule_single 21 --gpu_offset 0 --n_gpus 8 --num_workers 48 --batch_size 2 \
# python "${TOP_DIR}/mace-bench/scripts/opt_batch.py" --target_folder "${TAR_DIR}/structures" \
# --molecule_single 13 --gpu_offset 0 --n_gpus 8 --num_workers 48 --batch_size 2 \
# --max_steps 3000 --filter1 UnitCellFilter --filter2 UnitCellFilter \
# --optimizer1 BFGSFusedLS --optimizer2 BFGS --num_threads 2 --cueq true \
# --use_ordered_files true --model sevennet > opt.log 2>&1
......
rm -r *_result_*
python ../../scripts/mace_opt_batch.py --target_folder "../../data/perf_v2" --molecule_single 46 --gpu_offset 0 --n_gpus 4 --num_workers 40 --batch_size 0 \
python ../../scripts/opt_batch.py --target_folder "../../data/perf_v2" --molecule_single 46 --gpu_offset 0 --n_gpus 4 --num_workers 40 --batch_size 0 \
--max_steps 6000 --filter1 UnitCellFilter --filter2 UnitCellFilter --optimizer1 BFGSFusedLS --optimizer2 BFGS --num_threads 2 --cueq true --use_ordered_files true
\ No newline at end of file
......@@ -16,7 +16,7 @@ for config in "${natoms_nw_bs[@]}"; do
cd "$dir" || continue
pwd
python ../../scripts/mace_opt_batch.py \
python ../../scripts/opt_batch.py \
--target_folder "../../data/perf_v2_sorted/perf_v2_${natoms}" \
--molecule_single 46 --gpu_offset 0 --n_gpus 4 --num_workers ${nw} --batch_size ${bs} \
--max_steps 6000 --filter1 UnitCellFilter --filter2 UnitCellFilter \
......
......@@ -5,6 +5,7 @@ import time
import argparse
import os
import itertools
import sys
if __name__ == '__main__':
......@@ -15,7 +16,7 @@ if __name__ == '__main__':
##############################################################################################
parser = argparse.ArgumentParser()
parser.add_argument('--path', type=str, default="./", help='Path to process')
parser.add_argument('--smiles', type=str, required=True, help='SMILES string of the molecules, split by . if multiple molecules are used')
parser.add_argument('--smiles', type=str, default="None", help='SMILES string of the molecules, split by . if multiple molecules are used')
parser.add_argument('--generate_conformers', type=int, default=20, help='Number of conformers to generate. When it is <=0, only load existing conformers to generate structures')
parser.add_argument('--use_conformers', type=int, default=4, help='Number of conformers used to generate structure. When it is <=0, no structure generation would be done')
parser.add_argument('--molecule_num_in_cell', type=str, nargs='+', default=['1'], help='number of molecules in a unit cell, split by comma for multiple molecules, and split by space for multiple packings')
......@@ -23,9 +24,14 @@ if __name__ == '__main__':
parser.add_argument('--space_group_list', type=str, nargs='+', default=["2,14"], help='Space group list for structure generation, spilt by comma to add mutiple groups, split by space for multiple packings')
parser.add_argument('--add_name', type=str, nargs='+', default=["CRYSTAL"], help='Add name for the generated structures, split by space for multiple packings')
parser.add_argument('--max_workers', type=int, default=8, help='Maximum number of workers for parallel processing')
parser.add_argument('--mode', type=str, default=8, choices=['all', 'conformer_only', 'structure_only'], help='choose the jobs to do')
parser.add_argument('--mode', type=str, default='all', choices=['all', 'conformer_only', 'structure_only'], help='choose the jobs to do')
args = parser.parse_args()
mode = args.mode
if args.smiles == "None" and mode != "structure_only":
print("Smile is required for conformer search!")
sys.exit(0)
target_folder = args.path
smiles_list = args.smiles.split('.')
generate_conformers = args.generate_conformers
......@@ -35,9 +41,17 @@ if __name__ == '__main__':
space_group_list = [list(map(int, group.split(','))) for group in args.space_group_list]
add_name = args.add_name
max_workers = args.max_workers
mode = args.mode
num_molecules = len(smiles_list)
if mode == "structure_only":
num_molecules = 0
while True:
molecule_folder = os.path.join(target_folder, f"molecule_{num_molecules+1}")
if os.path.exists(molecule_folder) and os.path.isdir(molecule_folder):
num_molecules += 1
else:
break
num_packings = max(len(molecule_num_in_cell), len(space_group_list))
for i in range(len(molecule_num_in_cell)):
......
from basic_function import format_parser
from basic_function import packaged_function
from basic_function import conformer_search
import time
if __name__ == '__main__':
time_start = time.time()
# ##############################################################################################
# # conformer search
# conformer_search.conformer_search("C1CC2=COC=C12", "./test/molecule_1", num_conformers=10, max_attempts=10000, rms_thresh=0.1)
# ##############################################################################################
# ##############################################################################################
# single crystal structure generate with Z'=1
molecule1 = format_parser.read_xyz_file("./test/molecule_1/conformers/conformer_0.xyz")
packaged_function.CSP_generater_parallel([molecule1], "./test", need_structure=100, space_group_list=[14,61],add_name="XULDUD_C1", max_workers=16,start_seed=1)
# ##############################################################################################
# ##############################################################################################
# single crystal structure generate with Z'=2
molecule1 = format_parser.read_xyz_file("./test/molecule_1/conformers/conformer_0.xyz")
packaged_function.CSP_generater_parallel([molecule1,molecule1], "./test", need_structure=100, space_group_list=[14,61],add_name="XULDUD_C1", max_workers=16,start_seed=1)
# ##############################################################################################
# ##############################################################################################
# co-crystal structure generate
molecule1 = format_parser.read_xyz_file("./test/molecule_1/conformers/conformer_0.xyz")
molecule2 = format_parser.read_xyz_file("./test/molecule_2/conformers/conformer_0.xyz")
packaged_function.CSP_generater_parallel([molecule1,molecule2], "./test", need_structure=100, space_group_list=[14,61],add_name="XULDUD_C1", max_workers=16,start_seed=1)
# ##############################################################################################
time_end=time.time()
print('time cost',time_end-time_start,'s')
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment