Document data (#483)

* document species_indices * Improve documentation * More documentation for data * More clearly specify behaviour of SpeciesConverter * improve docs for SpeciesConverter * flake8 * Add order recommendation to ChemicalSymbolsToInts * Fix examples where species_to_tensor is used wrong * Fix argument

Document data (#483)
* document species_indices * Improve documentation * More documentation for data * More clearly specify behaviour of SpeciesConverter * improve docs for SpeciesConverter * flake8 * Add order recommendation to ChemicalSymbolsToInts * Fix examples where species_to_tensor is used wrong * Fix argument
e08bb3d5 · Ignacio Pickering · GitHub · 1b58c3c7 · e08bb3d5 · e08bb3d5
Unverified Commit e08bb3d5 authored Jun 09, 2020 by Ignacio Pickering Committed by GitHub Jun 09, 2020
Showing with 47 additions and 17 deletions

torchani/data/__init__.py torchani/data/__init__.py +31 -9

torchani/models.py torchani/models.py +2 -5

torchani/nn.py torchani/nn.py +11 -1

torchani/utils.py torchani/utils.py +3 -2

No files found.
--- a/torchani/data/__init__.py
+++ b/torchani/data/__init__.py
@@ -9,10 +9,28 @@ To do transformation, just do `it.transformation_name()`.

 Available transformations are listed below:

- `species_to_indices` converts species from strings to numbers.
- `subtract_self_energies` subtracts self energies, you can pass.
-    a dict of self energies, or an `EnergyShifter` to let it infer
-    self energy from dataset and store the result to the given shifter.
+- `species_to_indices` accepts two different kinds of arguments. It converts
+    species from elements (e. g. "H", "C", "Cl", etc) into internal torchani
+    indices (as returned by :class:`torchani.utils.ChemicalSymbolsToInts` or
+    the ``species_to_tensor`` method of a :class:`torchani.models.BuiltinModel`
+    and :class:`torchani.neurochem.Constants`), if its argument is an iterable
+    of species. By default species_to_indices behaves this way, with an
+    argument of ``('H', 'C', 'N', 'O', 'F', 'S', 'Cl')``  However, if its
+    argument is the string "periodic_table", then elements are converted into
+    atomic numbers ("periodic table indices") instead. This last option is
+    meant to be used when training networks that already perform a forward pass
+    of :class:`torchani.nn.SpeciesConverter` on their inputs in order to
+    convert elements to internal indices, before processing the coordinates.
+
+- `subtract_self_energies` subtracts self energies from all molecules of the
+    dataset. It accepts two different kinds of arguments: You can pass a dict
+    of self energies, in which case self energies are directly subtracted
+    according to the key-value pairs, or a
+    :class:`torchani.utils.EnergyShifter`, in which case the self energies are
+    calculated by linear regression and stored inside the class in the order
+    specified by species_order. By default the function orders by atomic
+    number if no extra argument is provided, but a specific order may be requested.
+
 - `remove_outliers`
 - `shuffle`
 - `cache` cache the result of previous transformations.
@@ -21,11 +39,15 @@ Available transformations are listed below:
 - `pin_memory` copy the tensor to pinned memory so that later transfer
    to cuda could be faster.

-By default `species_to_indices` and `subtract_self_energies` order atoms by
-atomic number. A special ordering can be used if requested, by calling
-`species_to_indices(species_order)` or `subtract_self_energies(energy_shifter,
-species_order)` however, this is definitely NOT recommended, it is best to
-always order according to atomic number.
+Note that orderings used in :class:`torchani.utils.ChemicalSymbolsToInts` and
+:class:`torchani.nn.SpeciesConverter` should be consistent with orderings used
+in `species_to_indices` and `subtract_self_energies`. To prevent confusion it
+is recommended that arguments to intialize converters and arguments to these
+functions all order elements *by their atomic number* (e. g. if you are working
+with hydrogen, nitrogen and bromine always use ['H', 'N', 'Br'] and never ['N',
+'H', 'Br'] or other variations).  It is possible to specify a different custom
+ordering, mainly due to backwards compatibility and to fully custom atom types,
+but doing so is NOT recommended, since it is very error prone.

 you can also use `split` to split the iterable to pieces. use `split` as:


--- a/torchani/models.py
+++ b/torchani/models.py
@@ -14,17 +14,14 @@ directly calculate energies or get an ASE calculator. For example:
    _, energies = ani1x((species, coordinates))
    ani1x.ase()  # get ASE Calculator using this ensemble
    # convert atom species from string to long tensor
-    ani1x.species_to_tensor('CHHHH')
+    ani1x.species_to_tensor(['C', 'H', 'H', 'H', 'H'])

    model0 = ani1x[0]  # get the first model in the ensemble
    # compute energy using the first model in the ANI-1x model ensemble
    _, energies = model0((species, coordinates))
    model0.ase()  # get ASE Calculator using this model
    # convert atom species from string to long tensor
-    model0.species_to_tensor('CHHHH')
-
-Note that the class BuiltinModels can be accessed but it is deprecated and
-shouldn't be used anymore.
+    model0.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
 """
 import os
 import torch

--- a/torchani/nn.py
+++ b/torchani/nn.py
@@ -108,7 +108,17 @@ class Gaussian(torch.nn.Module):


 class SpeciesConverter(torch.nn.Module):
-    """Convert from element index in the periodic table to 0, 1, 2, 3, ..."""
+    """Converts tensors with species labeled as atomic numbers into tensors
+    labeled with internal torchani indices according to a custom ordering
+    scheme. It takes a custom species ordering as initialization parameter. If
+    the class is initialized with ['H', 'C', 'N', 'O'] for example, it will
+    convert a tensor [1, 1, 6, 7, 1, 8] into a tensor [0, 0, 1, 2, 0, 3]
+
+    Arguments:
+        species (:class:`collections.abc.Sequence` of :class:`str`):
+        sequence of all supported species, in order (it is recommended to order
+        according to atomic number).
+    """

    def __init__(self, species):
        super().__init__()

--- a/torchani/utils.py
+++ b/torchani/utils.py
@@ -225,7 +225,8 @@ class ChemicalSymbolsToInts:

    Arguments:
        all_species (:class:`collections.abc.Sequence` of :class:`str`):
-            sequence of all supported species, in order.
+        sequence of all supported species, in order (it is recommended to order
+        according to atomic number).
    """

    def __init__(self, all_species):
@@ -355,7 +356,7 @@ def vibrational_analysis(masses, hessian, mode_type='MDU', unit='cm^-1'):


 def get_atomic_masses(species):
-    r"""Convert a tensor of znumbers into a tensor of atomic masses
+    r"""Convert a tensor of atomic numbers ("periodic table indices") into a tensor of atomic masses

    Atomic masses supported are the first 119 elements, and are taken from: