Unverified Commit e08bb3d5 authored by Ignacio Pickering's avatar Ignacio Pickering Committed by GitHub
Browse files

Document data (#483)

* document species_indices

* Improve documentation

* More documentation for data

* More clearly specify behaviour of SpeciesConverter

* improve docs for SpeciesConverter

* flake8

* Add order recommendation to ChemicalSymbolsToInts

* Fix examples where species_to_tensor is used wrong

* Fix argument
parent 1b58c3c7
...@@ -9,10 +9,28 @@ To do transformation, just do `it.transformation_name()`. ...@@ -9,10 +9,28 @@ To do transformation, just do `it.transformation_name()`.
Available transformations are listed below: Available transformations are listed below:
- `species_to_indices` converts species from strings to numbers. - `species_to_indices` accepts two different kinds of arguments. It converts
- `subtract_self_energies` subtracts self energies, you can pass. species from elements (e. g. "H", "C", "Cl", etc) into internal torchani
a dict of self energies, or an `EnergyShifter` to let it infer indices (as returned by :class:`torchani.utils.ChemicalSymbolsToInts` or
self energy from dataset and store the result to the given shifter. the ``species_to_tensor`` method of a :class:`torchani.models.BuiltinModel`
and :class:`torchani.neurochem.Constants`), if its argument is an iterable
of species. By default species_to_indices behaves this way, with an
argument of ``('H', 'C', 'N', 'O', 'F', 'S', 'Cl')`` However, if its
argument is the string "periodic_table", then elements are converted into
atomic numbers ("periodic table indices") instead. This last option is
meant to be used when training networks that already perform a forward pass
of :class:`torchani.nn.SpeciesConverter` on their inputs in order to
convert elements to internal indices, before processing the coordinates.
- `subtract_self_energies` subtracts self energies from all molecules of the
dataset. It accepts two different kinds of arguments: You can pass a dict
of self energies, in which case self energies are directly subtracted
according to the key-value pairs, or a
:class:`torchani.utils.EnergyShifter`, in which case the self energies are
calculated by linear regression and stored inside the class in the order
specified by species_order. By default the function orders by atomic
number if no extra argument is provided, but a specific order may be requested.
- `remove_outliers` - `remove_outliers`
- `shuffle` - `shuffle`
- `cache` cache the result of previous transformations. - `cache` cache the result of previous transformations.
...@@ -21,11 +39,15 @@ Available transformations are listed below: ...@@ -21,11 +39,15 @@ Available transformations are listed below:
- `pin_memory` copy the tensor to pinned memory so that later transfer - `pin_memory` copy the tensor to pinned memory so that later transfer
to cuda could be faster. to cuda could be faster.
By default `species_to_indices` and `subtract_self_energies` order atoms by Note that orderings used in :class:`torchani.utils.ChemicalSymbolsToInts` and
atomic number. A special ordering can be used if requested, by calling :class:`torchani.nn.SpeciesConverter` should be consistent with orderings used
`species_to_indices(species_order)` or `subtract_self_energies(energy_shifter, in `species_to_indices` and `subtract_self_energies`. To prevent confusion it
species_order)` however, this is definitely NOT recommended, it is best to is recommended that arguments to intialize converters and arguments to these
always order according to atomic number. functions all order elements *by their atomic number* (e. g. if you are working
with hydrogen, nitrogen and bromine always use ['H', 'N', 'Br'] and never ['N',
'H', 'Br'] or other variations). It is possible to specify a different custom
ordering, mainly due to backwards compatibility and to fully custom atom types,
but doing so is NOT recommended, since it is very error prone.
you can also use `split` to split the iterable to pieces. use `split` as: you can also use `split` to split the iterable to pieces. use `split` as:
......
...@@ -14,17 +14,14 @@ directly calculate energies or get an ASE calculator. For example: ...@@ -14,17 +14,14 @@ directly calculate energies or get an ASE calculator. For example:
_, energies = ani1x((species, coordinates)) _, energies = ani1x((species, coordinates))
ani1x.ase() # get ASE Calculator using this ensemble ani1x.ase() # get ASE Calculator using this ensemble
# convert atom species from string to long tensor # convert atom species from string to long tensor
ani1x.species_to_tensor('CHHHH') ani1x.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
model0 = ani1x[0] # get the first model in the ensemble model0 = ani1x[0] # get the first model in the ensemble
# compute energy using the first model in the ANI-1x model ensemble # compute energy using the first model in the ANI-1x model ensemble
_, energies = model0((species, coordinates)) _, energies = model0((species, coordinates))
model0.ase() # get ASE Calculator using this model model0.ase() # get ASE Calculator using this model
# convert atom species from string to long tensor # convert atom species from string to long tensor
model0.species_to_tensor('CHHHH') model0.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
Note that the class BuiltinModels can be accessed but it is deprecated and
shouldn't be used anymore.
""" """
import os import os
import torch import torch
......
...@@ -108,7 +108,17 @@ class Gaussian(torch.nn.Module): ...@@ -108,7 +108,17 @@ class Gaussian(torch.nn.Module):
class SpeciesConverter(torch.nn.Module): class SpeciesConverter(torch.nn.Module):
"""Convert from element index in the periodic table to 0, 1, 2, 3, ...""" """Converts tensors with species labeled as atomic numbers into tensors
labeled with internal torchani indices according to a custom ordering
scheme. It takes a custom species ordering as initialization parameter. If
the class is initialized with ['H', 'C', 'N', 'O'] for example, it will
convert a tensor [1, 1, 6, 7, 1, 8] into a tensor [0, 0, 1, 2, 0, 3]
Arguments:
species (:class:`collections.abc.Sequence` of :class:`str`):
sequence of all supported species, in order (it is recommended to order
according to atomic number).
"""
def __init__(self, species): def __init__(self, species):
super().__init__() super().__init__()
......
...@@ -225,7 +225,8 @@ class ChemicalSymbolsToInts: ...@@ -225,7 +225,8 @@ class ChemicalSymbolsToInts:
Arguments: Arguments:
all_species (:class:`collections.abc.Sequence` of :class:`str`): all_species (:class:`collections.abc.Sequence` of :class:`str`):
sequence of all supported species, in order. sequence of all supported species, in order (it is recommended to order
according to atomic number).
""" """
def __init__(self, all_species): def __init__(self, all_species):
...@@ -355,7 +356,7 @@ def vibrational_analysis(masses, hessian, mode_type='MDU', unit='cm^-1'): ...@@ -355,7 +356,7 @@ def vibrational_analysis(masses, hessian, mode_type='MDU', unit='cm^-1'):
def get_atomic_masses(species): def get_atomic_masses(species):
r"""Convert a tensor of znumbers into a tensor of atomic masses r"""Convert a tensor of atomic numbers ("periodic table indices") into a tensor of atomic masses
Atomic masses supported are the first 119 elements, and are taken from: Atomic masses supported are the first 119 elements, and are taken from:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment