Unverified Commit e08bb3d5 authored by Ignacio Pickering's avatar Ignacio Pickering Committed by GitHub
Browse files

Document data (#483)

* document species_indices

* Improve documentation

* More documentation for data

* More clearly specify behaviour of SpeciesConverter

* improve docs for SpeciesConverter

* flake8

* Add order recommendation to ChemicalSymbolsToInts

* Fix examples where species_to_tensor is used wrong

* Fix argument
parent 1b58c3c7
......@@ -9,10 +9,28 @@ To do transformation, just do `it.transformation_name()`.
Available transformations are listed below:
- `species_to_indices` converts species from strings to numbers.
- `subtract_self_energies` subtracts self energies, you can pass.
a dict of self energies, or an `EnergyShifter` to let it infer
self energy from dataset and store the result to the given shifter.
- `species_to_indices` accepts two different kinds of arguments. It converts
species from elements (e. g. "H", "C", "Cl", etc) into internal torchani
indices (as returned by :class:`torchani.utils.ChemicalSymbolsToInts` or
the ``species_to_tensor`` method of a :class:`torchani.models.BuiltinModel`
and :class:`torchani.neurochem.Constants`), if its argument is an iterable
of species. By default species_to_indices behaves this way, with an
argument of ``('H', 'C', 'N', 'O', 'F', 'S', 'Cl')`` However, if its
argument is the string "periodic_table", then elements are converted into
atomic numbers ("periodic table indices") instead. This last option is
meant to be used when training networks that already perform a forward pass
of :class:`torchani.nn.SpeciesConverter` on their inputs in order to
convert elements to internal indices, before processing the coordinates.
- `subtract_self_energies` subtracts self energies from all molecules of the
dataset. It accepts two different kinds of arguments: You can pass a dict
of self energies, in which case self energies are directly subtracted
according to the key-value pairs, or a
:class:`torchani.utils.EnergyShifter`, in which case the self energies are
calculated by linear regression and stored inside the class in the order
specified by species_order. By default the function orders by atomic
number if no extra argument is provided, but a specific order may be requested.
- `remove_outliers`
- `shuffle`
- `cache` cache the result of previous transformations.
......@@ -21,11 +39,15 @@ Available transformations are listed below:
- `pin_memory` copy the tensor to pinned memory so that later transfer
to cuda could be faster.
By default `species_to_indices` and `subtract_self_energies` order atoms by
atomic number. A special ordering can be used if requested, by calling
`species_to_indices(species_order)` or `subtract_self_energies(energy_shifter,
species_order)` however, this is definitely NOT recommended, it is best to
always order according to atomic number.
Note that orderings used in :class:`torchani.utils.ChemicalSymbolsToInts` and
:class:`torchani.nn.SpeciesConverter` should be consistent with orderings used
in `species_to_indices` and `subtract_self_energies`. To prevent confusion it
is recommended that arguments to intialize converters and arguments to these
functions all order elements *by their atomic number* (e. g. if you are working
with hydrogen, nitrogen and bromine always use ['H', 'N', 'Br'] and never ['N',
'H', 'Br'] or other variations). It is possible to specify a different custom
ordering, mainly due to backwards compatibility and to fully custom atom types,
but doing so is NOT recommended, since it is very error prone.
you can also use `split` to split the iterable to pieces. use `split` as:
......
......@@ -14,17 +14,14 @@ directly calculate energies or get an ASE calculator. For example:
_, energies = ani1x((species, coordinates))
ani1x.ase() # get ASE Calculator using this ensemble
# convert atom species from string to long tensor
ani1x.species_to_tensor('CHHHH')
ani1x.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
model0 = ani1x[0] # get the first model in the ensemble
# compute energy using the first model in the ANI-1x model ensemble
_, energies = model0((species, coordinates))
model0.ase() # get ASE Calculator using this model
# convert atom species from string to long tensor
model0.species_to_tensor('CHHHH')
Note that the class BuiltinModels can be accessed but it is deprecated and
shouldn't be used anymore.
model0.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
"""
import os
import torch
......
......@@ -108,7 +108,17 @@ class Gaussian(torch.nn.Module):
class SpeciesConverter(torch.nn.Module):
"""Convert from element index in the periodic table to 0, 1, 2, 3, ..."""
"""Converts tensors with species labeled as atomic numbers into tensors
labeled with internal torchani indices according to a custom ordering
scheme. It takes a custom species ordering as initialization parameter. If
the class is initialized with ['H', 'C', 'N', 'O'] for example, it will
convert a tensor [1, 1, 6, 7, 1, 8] into a tensor [0, 0, 1, 2, 0, 3]
Arguments:
species (:class:`collections.abc.Sequence` of :class:`str`):
sequence of all supported species, in order (it is recommended to order
according to atomic number).
"""
def __init__(self, species):
super().__init__()
......
......@@ -225,7 +225,8 @@ class ChemicalSymbolsToInts:
Arguments:
all_species (:class:`collections.abc.Sequence` of :class:`str`):
sequence of all supported species, in order.
sequence of all supported species, in order (it is recommended to order
according to atomic number).
"""
def __init__(self, all_species):
......@@ -355,7 +356,7 @@ def vibrational_analysis(masses, hessian, mode_type='MDU', unit='cm^-1'):
def get_atomic_masses(species):
r"""Convert a tensor of znumbers into a tensor of atomic masses
r"""Convert a tensor of atomic numbers ("periodic table indices") into a tensor of atomic masses
Atomic masses supported are the first 119 elements, and are taken from:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment