Initial commit

47af8be9 · yuhai · 47af8be9 · 47af8be9 · 47af8be9 · 47af8be9
Commit 47af8be9 authored Jun 26, 2023 by yuhai
20 changed files
--- a/examples/water_cluster/test.sh
+++ b/examples/water_cluster/test.sh
+deepks scf share/scf_input.yaml -m iter.09/01.train/model.pth -s systems/test.n6 -F e_tot f_tot conv rdm -d test_results -G
--- a/examples/water_single/.gitignore
+++ b/examples/water_single/.gitignore
+*/iter.*
+*/share
+*/log.*
+*/err.*
+*/RECORD
+*/PID
--- a/examples/water_single/README.md
+++ b/examples/water_single/README.md
+# Example for water
+This is an example on how to use `deepks` library to train a energy functional for water molecules. The sub-folders are grouped as following:
+- `systems` contains all data that has been prepared in `deepks` format.
+- `init` contains input files used to train a (perturbative) energy model (DeePHF).
+- `iter` contains input files used to train a self consistent model iteratively (DeePKS).
+- `withdens` contains input files used to train a SCF model with density labels.
+## Prepare data
+To prepare data, please first note that `deepks` use the atomic units by default but will switch to Angstrom(Å) as length unit when using xyz files as systems. 
+Property | Unit
+---	     | :---:
+Length	 | Bohr (Å if from xyz)
+Energy	 | $E_h$ (Hartree)
+Force	   | $E_h$/Bohr ($E_h$/Å if from xyz)
+`deepks` accepts data in three formats. 
+- **single `xyz` files** with properties saved as separate files sharing same base name.
+  e.g. for `0000.xyz`, its energy can be saved as `0000.energy.npy`, and forces as `0000.force.npy`, density matrix as `0000.dm.npy` in the same folder.
+- **grouped into folders** with same number of atoms. 
+  Such folder should contain an `atom.npy` that has shape `n_frames x n_atoms x 4` and the four elements correspond to the nuclear charge of the atom and its three spacial coordinates.
+  Other properties can be provided as separate files like `energy.npy` and `force.npy`.
+- **grouped with explicit `type.raw` file** with all frames have same type of elements.
+  This is similar as above, only that `atom.npy` is substituted by `coord.npy` containing pure special coordinates and a `type.raw` containing the element type for all the frames of this system. This format is very similar to the one used in DeePMD-Kit, but the `type.raw` must contains real element types here.
+Note the property files are optional. For pure SCF calculation, they are not needed. But in order to train a model, they are needed as labels.
+The two grouped data formats can be converted from the xyz format by using [this script](../../scripts/convert_xyz.py). As an example, the data in `systems` folder is created using the following command.
+```
+python ../../scripts/convert_xyz.py some/path/to/all/*.xyz -d systems -G 300 -P group
+```
+## Train an energy model
+To train a perturbative energy model is a pure machine learning task. Please see [DeePHF paper](https://arxiv.org/pdf/2005.00169.pdf) for a detailed explanation of the construction of the descriptors. Here we provide two sub-commands. `deepks scf` can do the Hartree-Fock calculation and save the descriptor (`dm_eig`) as well as labels (`l_e_delta` for energy and `l_f_delta` for force) automatically. `deepks train` can use the dumped descriptors and labels to train a neural network model.
+To further simplify the procedure, we can combine the two steps together and use `deepks iterate` to run them sequentially. The required input files and execution scripts can be found in `init` folder. There `machines.yaml` specifies the resources needed for the calculations. `params.yaml` specifies the parameters needed for the Hartree-Fock calculation and neural network training. `systems.yaml` specifies the data needed for training and testing. Note the name `init` is because it also serves as an initialization step of the self consistent training described below. For same reason, the `niter` attribute in `params.yaml` is set to 0, to avoid iterative training.
+As shown in `run.sh`, the input files can be loaded and run by 
+```
+deepks iterate machines.yaml params.yaml systems.yaml
+```
+where `deepks` is a shortcut for `python -m deepks`. Or one can directly use `./run.sh` to run it in background. Make sure you are in `init` folder before you run the command.
+## Train a self consistent model
+To train a self consistent model we follow the iterative approach described in [DeePKS paper](https://arxiv.org/pdf/2008.00167.pdf). We provide `deepks iterate` as a tool to do the iteration automatically. Same as above, the example input file and execution scripts can be found in `iter` folder. Note here instead of splitting the input file into three, we combined all input settings in one `args.yaml` file, to show that `deepks iterate` can take variable number of input files. The file provided at last will have highest priority.
+For each iteration, there will be four steps using four corresponding tools provided by `deepks`. Each step would correspond to a row in `RECORD` file, used to indicate which steps have finished. It would have three numbers. The first one correspond to the iteration number. The second one correspond to the sub-folder in the iteration and the third correspond to step in that folder.
+- `deepks scf` (`X 0 0`): do the SCF calculation with given model and save the results
+- `deepks stats` (`X 0 1`): check the SCF result and print convergence and accuracy
+- `deepks train` (`X 1 0`): train a new model using the old one as starting point
+- `deepks test` (`X 1 1`): test the model on all data to see the pure fitting error
+To run the iteration, again, use `./run.sh` or the following command
+```
+deepks iterate args.yaml
+```
+Make sure you are in `iter` folder before you run the command.
+One can check `iter.*/00.scf/log.data` for stats of SCF results, `iter*/01.train/log.train` for training curve and `iter*/01.train/log.test` for model prediction of $E_\delta$ (e_delta).
+## Train a self consistent model with density labels
+We provide in `withdens` folder a set of inputs of using density labels during the iterative training (as additional penalty terms in the Hamiltonian). We again follow the [DeePKS paper](https://arxiv.org/pdf/2008.00167.pdf) to add first a randomized penalty using Coulomb loss for 5 iterations and then remove it and relax for another 5 iterations.
+Most of the inputs are same as the normal iterative training case described in the last section, which we put in the `base.yaml` Only that we are overwritten `scf_input` in `penalty.yaml` to add the penalties. Also we change the number of iteration `n_iter` in both `penalty.yaml` and `relax.yaml`.
+`pipe.sh` shows how we combine the different inputs together. A simplified version is as follows:
+```
+deepks iterate base.yaml penalty.yaml && deepks iterate base.yaml relax.yaml
+```
+The `iterate` command can take multiple input files and the latter ones would overwrite the former ones.
+Again, running `./run.sh` in the `withdens` folder would run the commands in the background. You can check the results in `iter.*` folders like above.
\ No newline at end of file
--- a/examples/water_single/init/machines.yaml
+++ b/examples/water_single/init/machines.yaml
+# this is only part of input settings. 
+# should be used together with systems.yaml and params.yaml
+scf_machine: 
+  # every system will be run as a separate command (a task)
+  sub_size: 1 
+  # 4 tasks will be gathered into one group and submitted together as a shell script
+  group_size: 4
+  dispatcher: 
+    context: local
+    batch: shell # set to shell to run on local machine, you can also use `slurm`
+    remote_profile: null # not needed in local case
+  # resources are no longer needed, other than the envs can still be set here
+  resources:
+    envs:
+      PYSCF_MAX_MEMORY: 8000 # increase from 4G to 8G
+  python: "python" # use python in path
+train_machine: 
+  dispatcher: 
+    context: local
+    batch: shell # same as above, use shell to run on local machine
+    remote_profile: null # use lazy local
+  python: "python" # use python in path
+  # resources are no longer needed, and the task will use gpu automatically if there is one
+# other settings (these are default, can be omitted)
+cleanup: false # whether to delete slurm and err files
+strict: true # do not allow undefined machine parameters
--- a/examples/water_single/init/params.yaml
+++ b/examples/water_single/init/params.yaml
+# this is only part of input settings. 
+# should be used together with systems.yaml and machines.yaml
+# number of iterations to do, can be set to zero for DeePHF training
+n_iter: 0
+# directory setting (these are default choices, can be omitted)
+workdir: "."
+share_folder: "share" # folder that stores all other settings
+# scf settings, set to false when n_iter = 0 to skip checking
+scf_input: false
+# train settings, set to false when n_iter = 0 to skip checking
+train_input: false
+# init settings, these are for DeePHF task
+init_model: false # do not use existing model to restart from
+init_scf: # parameters for SCF calculation
+  basis: ccpvdz
+  # this is for pure energy training
+  dump_fields: 
+    - e_base # Hartree Fock energy
+    - dm_eig # Descriptors
+    - conv # whether converged or not
+    - l_e_delta # delta energy betweem e_base and reference, label
+  verbose: 1
+  mol_args: # args to be passed to pyscf.gto.Mole.build
+    incore_anyway: True
+  scf_args: # args to be passed to pyscf.scf.RHF.run
+    conv_tol: 1e-8
+    conv_check: false # pyscf conv_check has a bug
+init_train: # parameters for nn training
+  model_args:
+    hidden_sizes: [100, 100, 100] # neurons in hidden layers
+    output_scale: 100 # the output will be divided by 100 before compare with label
+    use_resnet: true # skip connection
+    actv_fn: mygelu # same as gelu, support force calculation
+  data_args: 
+    batch_size: 16
+    group_batch: 1 # can collect multiple system in one batch
+  preprocess_args:
+    preshift: true # shift the descriptor by its mean
+    prescale: false # scale the descriptor by its variance (can cause convergence problem)
+    prefit_ridge: 1e1 # do a ridge regression as prefitting
+    prefit_trainable: false
+  train_args: 
+    decay_rate: 0.96 # learning rate decay factor
+    decay_steps: 500 # decay the learning rate every this steps
+    display_epoch: 100
+    n_epoch: 10000
+    start_lr: 0.0003
--- a/examples/water_single/init/run.sh
+++ b/examples/water_single/init/run.sh
+nohup python -u -m deepks iterate machines.yaml params.yaml systems.yaml >> log.iter 2> err.iter & 
+echo $! > PID
\ No newline at end of file
--- a/examples/water_single/init/systems.yaml
+++ b/examples/water_single/init/systems.yaml
+# this is only part of input settings. 
+# should be used together with params.yaml and machines.yaml
+# training and testing systems
+systems_train: # can also be files that containing system paths
+  - ../systems/group.0[0-2] # support glob
+systems_test: # if empty, use the last system of training set
+  - ../systems/group.03
--- a/examples/water_single/iter/args.yaml
+++ b/examples/water_single/iter/args.yaml
+# all arguments are flatten into this file
+# they can also be splitted into separate files and referenced here
+n_iter: 5
+# training and testing systems
+systems_train: # can also be files that containing system paths
+  - ../systems/group.0[0-2] # support glob
+systems_test: # if empty, use the last system of training set
+  - ../systems/group.03
+# directory setting
+workdir: "."
+share_folder: "share" # folder that stores all other settings
+# scf settings
+scf_input: # can also be specified by a separete file
+  basis: ccpvdz
+  # this is for force training
+  dump_fields: [e_base, e_tot, dm_eig, conv, f_base, f_tot, grad_vx, l_f_delta, l_e_delta]
+  verbose: 1
+  mol_args:
+    incore_anyway: True
+  scf_args:
+    conv_tol: 1e-6
+    conv_tol_grad: 1e-2
+    level_shift: 0.1
+    diis_space: 20
+    conv_check: false # pyscf conv_check has a bug
+scf_machine: 
+  # every system will be run as a separate command (a task)
+  sub_size: 1 
+  # 4 tasks will be gathered into one group and submitted together as a shell script
+  group_size: 4
+  dispatcher: 
+    context: local
+    batch: shell # set to shell to run on local machine
+    remote_profile: null # not needed in local case
+  # resources are no longer needed, other than the envs can still be set here
+  resources:
+    envs:
+      PYSCF_MAX_MEMORY: 8000 # increase from 4G to 8G
+  python: "python" # use python in path
+# train settings
+train_input:
+  # model_args is ignored, since this is used as restart
+  data_args: 
+    batch_size: 16
+    group_batch: 1
+    extra_label: true
+    conv_filter: true
+    conv_name: conv
+  preprocess_args:
+    preshift: false # restarting model already shifted. Will not recompute shift value
+    prescale: false # same as above
+    prefit_ridge: 1e1
+    prefit_trainable: false
+  train_args: 
+    decay_rate: 0.5
+    decay_steps: 1000
+    display_epoch: 100
+    force_factor: 1
+    n_epoch: 5000
+    start_lr: 0.0001
+train_machine: 
+  dispatcher: 
+    context: local
+    batch: shell # same as above, use shell to run on local machine
+    remote_profile: null # use lazy local
+  python: "python" # use python in path
+  # resources are no longer needed, and the task will use gpu automatically if there is one
+# init settings
+init_model: false # do not use existing model in share_folder/init/model.pth
+init_scf: 
+  basis: ccpvdz
+  # this is for pure energy training
+  dump_fields: [e_base, e_tot, dm_eig, conv, l_e_delta]
+  verbose: 1
+  mol_args:
+    incore_anyway: True
+  scf_args:
+    conv_tol: 1e-8
+    conv_check: false # pyscf conv_check has a bug
+init_train: 
+  model_args: # necessary as this is init training
+    hidden_sizes: [100, 100, 100]
+    output_scale: 100
+    use_resnet: true
+    actv_fn: gelu
+  data_args: 
+    batch_size: 16
+    group_batch: 1
+  preprocess_args:
+    preshift: true
+    prescale: false
+    prefit_ridge: 1e1
+    prefit_trainable: false
+  train_args: 
+    decay_rate: 0.95
+    decay_steps: 300
+    display_epoch: 100
+    n_epoch: 10000
+    start_lr: 0.0003
+# other settings
+cleanup: false
+strict: true
--- a/examples/water_single/iter/run.sh
+++ b/examples/water_single/iter/run.sh
+nohup python -u -m deepks iterate args.yaml >> log.iter 2> err.iter &
+echo $! > PID
--- a/examples/water_single/systems/group.00/atom.npy
+++ b/examples/water_single/systems/group.00/atom.npy
--- a/examples/water_single/systems/group.00/dm.npy
+++ b/examples/water_single/systems/group.00/dm.npy
--- a/examples/water_single/systems/group.00/energy.npy
+++ b/examples/water_single/systems/group.00/energy.npy
--- a/examples/water_single/systems/group.00/force.npy
+++ b/examples/water_single/systems/group.00/force.npy
--- a/examples/water_single/systems/group.01/atom.npy
+++ b/examples/water_single/systems/group.01/atom.npy
--- a/examples/water_single/systems/group.01/dm.npy
+++ b/examples/water_single/systems/group.01/dm.npy
--- a/examples/water_single/systems/group.01/energy.npy
+++ b/examples/water_single/systems/group.01/energy.npy
--- a/examples/water_single/systems/group.01/force.npy
+++ b/examples/water_single/systems/group.01/force.npy
--- a/examples/water_single/systems/group.02/atom.npy
+++ b/examples/water_single/systems/group.02/atom.npy
--- a/examples/water_single/systems/group.02/dm.npy
+++ b/examples/water_single/systems/group.02/dm.npy
--- a/examples/water_single/systems/group.02/energy.npy
+++ b/examples/water_single/systems/group.02/energy.npy