Unverified Commit e9440acb authored by Jinjing Zhou's avatar Jinjing Zhou Committed by GitHub
Browse files

[TF] TF backend fix and new logic to choose backend (#1393)



* TF backend fix and new logic to choose backend

* fix

* fix

* fix

* fix

* fix backend

* fix

* dlpack alignment

* add flag

* flag

* lint

* lint

* remove unused

* several fixes
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
parent 4b4186f8
...@@ -152,7 +152,7 @@ DGL should work on ...@@ -152,7 +152,7 @@ DGL should work on
DGL requires Python 3.5 or later. DGL requires Python 3.5 or later.
Right now, DGL works on [PyTorch](https://pytorch.org) 1.1.0+, [MXNet](https://mxnet.apache.org) nightly build, and [TensorFlow](https://tensorflow.org) 2.0+. Right now, DGL works on [PyTorch](https://pytorch.org) 1.2.0+, [MXNet](https://mxnet.apache.org) 1.5.1+, and [TensorFlow](https://tensorflow.org) 2.1.0+.
### Using anaconda ### Using anaconda
......
...@@ -3,7 +3,8 @@ dependencies: ...@@ -3,7 +3,8 @@ dependencies:
- python=3.6.9 - python=3.6.9
- pip - pip
- pip: - pip:
- tensorflow==2.1.0rc1 - tensorflow==2.2.0rc1
# - tf-nightly==2.2.0.dev20200327
- tfdlpack - tfdlpack
- pytest - pytest
- nose - nose
......
name: tensorflow-ci name: tensorflow-ci
dependencies: dependencies:
- python=3.6.9 - python=3.6.9
- pip - pip
- pip: - pip:
- tensorflow-gpu==2.1.0rc1 - tensorflow==2.2.0rc1
# - tf-nightly==2.2.0.dev20200327
- tfdlpack-gpu - tfdlpack-gpu
- pytest - pytest
- nose - nose
......
...@@ -3,14 +3,21 @@ ...@@ -3,14 +3,21 @@
Working with different backends Working with different backends
=============================== ===============================
DGL supports PyTorch, MXNet and Tensorflow backends. To change them, set the ``DGLBACKEND`` DGL supports PyTorch, MXNet and Tensorflow backends.
environcment variable. The default backend is PyTorch. DGL will choose the backend on the following options (high priority to low priority)
- `DGLBACKEND` environment
- You can use `DGLBACKEND=[BACKEND] python gcn.py ...` to specify the backend
- Or `export DGLBACKEND=[BACKEND]` to set the global environment variable
- `config.json` file under "~/.dgl"
- You can use `python -m dgl.backend.set_default_backend [BACKEND]` to set the default backend
Currently BACKEND can be chosen from mxnet, pytorch, tensorflow.
PyTorch backend PyTorch backend
--------------- ---------------
Export ``DGLBACKEND`` as ``pytorch`` to specify PyTorch backend. The required PyTorch Export ``DGLBACKEND`` as ``pytorch`` to specify PyTorch backend. The required PyTorch
version is 0.4.1 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions. version is 1.1.0 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
MXNet backend MXNet backend
------------- -------------
...@@ -32,18 +39,10 @@ Tensorflow backend ...@@ -32,18 +39,10 @@ Tensorflow backend
------------------ ------------------
Export ``DGLBACKEND`` as ``tensorflow`` to specify Tensorflow backend. The required Tensorflow Export ``DGLBACKEND`` as ``tensorflow`` to specify Tensorflow backend. The required Tensorflow
version is 2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation version is 2.2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
instructions. In addition, Tensorflow backend requires ``tfdlpack`` package installed as follows and set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory: instructions. In addition, DGL will set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
.. code:: bash
pip install tfdlpack # when using tensorflow cpu version
or
.. code:: bash .. code:: bash
pip install tfdlpack-gpu # when using tensorflow gpu version pip install "tensorflow>=2.2.0rc1" # when using tensorflow cpu version
export TF_FORCE_GPU_ALLOW_GROWTH=true # and add this to your .bashrc/.zshrc file if needed
...@@ -474,8 +474,8 @@ DGL_DLL int DGLArrayFromDLPack(DLManagedTensor* from, ...@@ -474,8 +474,8 @@ DGL_DLL int DGLArrayFromDLPack(DLManagedTensor* from,
* \param out The DLManagedTensor handle. * \param out The DLManagedTensor handle.
* \return 0 when success, -1 when failure happens * \return 0 when success, -1 when failure happens
*/ */
DGL_DLL int DGLArrayToDLPack(DGLArrayHandle from, DGL_DLL int DGLArrayToDLPack(DGLArrayHandle from, DLManagedTensor** out,
DLManagedTensor** out); int alignment = 0);
/*! /*!
* \brief Delete (free) a DLManagedTensor's data. * \brief Delete (free) a DLManagedTensor's data.
......
...@@ -5,7 +5,7 @@ import socket ...@@ -5,7 +5,7 @@ import socket
# Need to ensure that the backend framework is imported before load dgl libs, # Need to ensure that the backend framework is imported before load dgl libs,
# otherwise weird cuda problem happens # otherwise weird cuda problem happens
from .backend import load_backend from .backend import load_backend, backend_name
from . import function from . import function
from . import contrib from . import contrib
......
...@@ -73,15 +73,23 @@ class NDArrayBase(object): ...@@ -73,15 +73,23 @@ class NDArrayBase(object):
def _dgl_handle(self): def _dgl_handle(self):
return ctypes.cast(self.handle, ctypes.c_void_p).value return ctypes.cast(self.handle, ctypes.c_void_p).value
def to_dlpack(self): def to_dlpack(self, alignment=0):
"""Produce an array from a DLPack Tensor without copying memory """Produce an array from a DLPack Tensor without copying memory
Args
-------
alignment: int, default to be 0
Indicates the alignment requirement when converting to dlpack. Will copy to a
new tensor if the alignment requirement is not satisfied.
0 means no alignment requirement.
Returns Returns
------- -------
dlpack : DLPack tensor view of the array data dlpack : DLPack tensor view of the array data
""" """
ptr = ctypes.c_void_p() ptr = ctypes.c_void_p()
check_call(_LIB.DGLArrayToDLPack(self.handle, ctypes.byref(ptr))) check_call(_LIB.DGLArrayToDLPack(self.handle, ctypes.byref(ptr), alignment))
return ctypes.pythonapi.PyCapsule_New(ptr, _c_str_dltensor, _c_dlpack_deleter) return ctypes.pythonapi.PyCapsule_New(ptr, _c_str_dltensor, _c_dlpack_deleter)
......
...@@ -112,7 +112,8 @@ cdef extern from "dgl/runtime/c_runtime_api.h": ...@@ -112,7 +112,8 @@ cdef extern from "dgl/runtime/c_runtime_api.h":
int DGLArrayFromDLPack(DLManagedTensor* arr_from, int DGLArrayFromDLPack(DLManagedTensor* arr_from,
DLTensorHandle* out) DLTensorHandle* out)
int DGLArrayToDLPack(DLTensorHandle arr_from, int DGLArrayToDLPack(DLTensorHandle arr_from,
DLManagedTensor** out) DLManagedTensor** out,
int alignment)
void DGLDLManagedTensorCallDeleter(DLManagedTensor* dltensor) void DGLDLManagedTensorCallDeleter(DLManagedTensor* dltensor)
cdef extern from "dgl/runtime/c_object_api.h": cdef extern from "dgl/runtime/c_object_api.h":
......
...@@ -59,9 +59,16 @@ cdef class NDArrayBase: ...@@ -59,9 +59,16 @@ cdef class NDArrayBase:
if self.c_is_view == 0: if self.c_is_view == 0:
CALL(DGLArrayFree(self.chandle)) CALL(DGLArrayFree(self.chandle))
def to_dlpack(self): def to_dlpack(self, alignment=0):
"""Produce an array from a DLPack Tensor without copying memory """Produce an array from a DLPack Tensor without copying memory
Args
-------
alignment: int, default to be 0
Indicates the alignment requirement when converting to dlpack. Will copy to a
new tensor if the alignment requirement is not satisfied.
0 means no alignment requirement.
Returns Returns
------- -------
dlpack : DLPack tensor view of the array data dlpack : DLPack tensor view of the array data
...@@ -69,7 +76,7 @@ cdef class NDArrayBase: ...@@ -69,7 +76,7 @@ cdef class NDArrayBase:
cdef DLManagedTensor* dltensor cdef DLManagedTensor* dltensor
if self.c_is_view != 0: if self.c_is_view != 0:
raise ValueError("to_dlpack do not work with memory views") raise ValueError("to_dlpack do not work with memory views")
CALL(DGLArrayToDLPack(self.chandle, &dltensor)) CALL(DGLArrayToDLPack(self.chandle, &dltensor, alignment))
return pycapsule.PyCapsule_New(dltensor, _c_str_dltensor, _c_dlpack_deleter) return pycapsule.PyCapsule_New(dltensor, _c_str_dltensor, _c_dlpack_deleter)
......
from __future__ import absolute_import from __future__ import absolute_import
import sys, os import sys
import os
import json
import importlib import importlib
from . import backend from . import backend
from .set_default_backend import set_default_backend
_enabled_apis = set() _enabled_apis = set()
def _gen_missing_api(api, mod_name): def _gen_missing_api(api, mod_name):
def _missing_api(*args, **kwargs): def _missing_api(*args, **kwargs):
raise ImportError('API "%s" is not supported by backend "%s".' raise ImportError('API "%s" is not supported by backend "%s".'
...@@ -14,6 +18,7 @@ def _gen_missing_api(api, mod_name): ...@@ -14,6 +18,7 @@ def _gen_missing_api(api, mod_name):
' the DGLBACKEND environment.' % (api, mod_name)) ' the DGLBACKEND environment.' % (api, mod_name))
return _missing_api return _missing_api
def load_backend(mod_name): def load_backend(mod_name):
mod = importlib.import_module('.%s' % mod_name, __name__) mod = importlib.import_module('.%s' % mod_name, __name__)
thismod = sys.modules[__name__] thismod = sys.modules[__name__]
...@@ -45,7 +50,29 @@ def load_backend(mod_name): ...@@ -45,7 +50,29 @@ def load_backend(mod_name):
else: else:
setattr(thismod, api, _gen_missing_api(api, mod_name)) setattr(thismod, api, _gen_missing_api(api, mod_name))
load_backend(os.environ.get('DGLBACKEND', 'pytorch').lower())
def get_preferred_backend():
config_path = os.path.join(os.path.expanduser('~'), '.dgl', 'config.json')
backend_name = None
if "DGLBACKEND" in os.environ:
backend_name = os.getenv('DGLBACKEND')
elif os.path.exists(config_path):
with open(config_path, "r") as config_file:
config_dict = json.load(config_file)
backend_name = config_dict.get('backend', '').lower()
if (backend_name in ['tensorflow', 'mxnet', 'pytorch']):
return backend_name
else:
while not(backend_name in ['tensorflow', 'mxnet', 'pytorch']):
print("DGL does not detect a valid backend option. Which backend would you like to work with?")
backend_name = input("Backend choice (pytorch, mxnet or tensorflow): ").lower()
set_default_backend(backend_name)
return backend_name
load_backend(get_preferred_backend())
def is_enabled(api): def is_enabled(api):
"""Return true if the api is enabled by the current backend. """Return true if the api is enabled by the current backend.
......
...@@ -14,7 +14,7 @@ from ...function.base import TargetCode ...@@ -14,7 +14,7 @@ from ...function.base import TargetCode
MX_VERSION = LooseVersion(mx.__version__) MX_VERSION = LooseVersion(mx.__version__)
if MX_VERSION.version[0] == 1 and MX_VERSION.version[1] < 5: if MX_VERSION.version[0] == 1 and MX_VERSION.version[1] < 5:
raise Exception("DGL has to work with MXNet version >= 1.5") raise RuntimeError("DGL requires mxnet >= 1.5")
# After MXNet 1.5, empty tensors aren't supprted by default. # After MXNet 1.5, empty tensors aren't supprted by default.
# After we turn on the numpy compatible flag, MXNet supports empty NDArray. # After we turn on the numpy compatible flag, MXNet supports empty NDArray.
......
...@@ -2,6 +2,7 @@ from __future__ import absolute_import ...@@ -2,6 +2,7 @@ from __future__ import absolute_import
from distutils.version import LooseVersion from distutils.version import LooseVersion
import scipy # Weird bug in new pytorch when import scipy after import torch
import torch as th import torch as th
import builtins import builtins
from torch.utils import dlpack from torch.utils import dlpack
...@@ -9,8 +10,11 @@ from torch.utils import dlpack ...@@ -9,8 +10,11 @@ from torch.utils import dlpack
from ... import ndarray as nd from ... import ndarray as nd
from ... import kernel as K from ... import kernel as K
from ...function.base import TargetCode from ...function.base import TargetCode
from ...base import dgl_warning
TH_VERSION = LooseVersion(th.__version__) if LooseVersion(th.__version__) < LooseVersion("1.2.0"):
dgl_warning("Detected an old version of PyTorch. Suggest using torch>=1.2.0 "
"for the best experience.")
def data_type_dict(): def data_type_dict():
return {'float16' : th.float16, return {'float16' : th.float16,
......
import argparse
import os
import json
def set_default_backend(backend_name):
default_dir = os.path.join(os.path.expanduser('~'), '.dgl')
if not os.path.exists(default_dir):
os.makedirs(default_dir)
config_path = os.path.join(default_dir, 'config.json')
with open(config_path, "w") as config_file:
json.dump({'backend': backend_name.lower()}, config_file)
print('Set the default backend to "{}". You can change it in the '
'~/.dgl/config.json file or export the DGLBACKEND environment variable.'.format(
backend_name))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("backend", nargs=1, type=str, choices=[
'pytorch', 'tensorflow', 'mxnet'], help="Set default backend")
args = parser.parse_args()
set_default_backend(args.backend[0])
import os
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
from .tensor import * from .tensor import *
"""Tensorflow backend implementation"""
from __future__ import absolute_import from __future__ import absolute_import
from distutils.version import LooseVersion from distutils.version import LooseVersion
...@@ -6,16 +6,41 @@ from distutils.version import LooseVersion ...@@ -6,16 +6,41 @@ from distutils.version import LooseVersion
import tensorflow as tf import tensorflow as tf
from tensorflow.python.eager import context from tensorflow.python.eager import context
import builtins import builtins
import tfdlpack
import numpy as np import numpy as np
from tfdlpack import to_dlpack, from_dlpack import os
from ... import ndarray as nd from ... import ndarray as nd
from ... import kernel as K from ... import kernel as K
from ...function.base import TargetCode from ...function.base import TargetCode
TF_VERSION = LooseVersion(tf.__version__) if os.getenv("USE_OFFICIAL_TFDLPACK", False):
if LooseVersion(tf.__version__) < LooseVersion("2.2.0"):
raise RuntimeError("DGL requires tensorflow>=2.2.0 for the official DLPack support.")
def zerocopy_to_dlpack(input):
return tf.experimental.dlpack.to_dlpack(input)
def zerocopy_from_dlpack(dlpack_tensor):
# TODO(Jinjing): Tensorflow requires memory to be 64-bit aligned. We check the
# alignment and make a copy if needed. The functionality is better in TF's main repo.
aligned = nd.from_dlpack(dlpack_tensor).to_dlpack(64)
return tf.experimental.dlpack.from_dlpack(aligned)
else:
# Use our own DLPack solution
try:
import tfdlpack
except ImportError:
raise ImportError('Cannot find tfdlpack, which is required by the Tensorflow backend. '
'Please follow https://github.com/VoVAllen/tf-dlpack for installation.')
if LooseVersion(tf.__version__) < LooseVersion("2.1.0"):
raise RuntimeError("DGL requires tensorflow>=2.1.0.")
def zerocopy_to_dlpack(input):
return tfdlpack.to_dlpack(input)
def zerocopy_from_dlpack(input):
return tfdlpack.from_dlpack(input)
def data_type_dict(): def data_type_dict():
return {'float16': tf.float16, return {'float16': tf.float16,
...@@ -27,11 +52,9 @@ def data_type_dict(): ...@@ -27,11 +52,9 @@ def data_type_dict():
'int32': tf.int32, 'int32': tf.int32,
'int64': tf.int64} 'int64': tf.int64}
def cpu(): def cpu():
return "/cpu:0" return "/cpu:0"
def tensor(data, dtype=None): def tensor(data, dtype=None):
return tf.convert_to_tensor(data, dtype=dtype) return tf.convert_to_tensor(data, dtype=dtype)
...@@ -355,16 +378,7 @@ def rand_shuffle(arr): ...@@ -355,16 +378,7 @@ def rand_shuffle(arr):
return tf.random.shuffle(arr) return tf.random.shuffle(arr)
def zerocopy_to_dlpack(input):
return tfdlpack.to_dlpack(input)
def zerocopy_from_dlpack(dlpack_tensor):
return tfdlpack.from_dlpack(dlpack_tensor)
def zerocopy_to_numpy(input): def zerocopy_to_numpy(input):
# NOTE: not zerocopy
return np.asarray(memoryview(input)) return np.asarray(memoryview(input))
......
...@@ -13,12 +13,24 @@ ...@@ -13,12 +13,24 @@
#include <dgl/graph_interface.h> #include <dgl/graph_interface.h>
#include <algorithm> #include <algorithm>
#include <vector> #include <vector>
#include <string>
using dgl::runtime::operator<<; using dgl::runtime::operator<<;
/*! \brief Output the string representation of device context.*/ /*! \brief Output the string representation of device context.*/
inline std::ostream& operator << (std::ostream& os, const DLContext& ctx) { inline std::ostream& operator<<(std::ostream& os, const DLContext& ctx) {
return os << ctx.device_type << ":" << ctx.device_id; std::string device_name;
switch (ctx.device_type) {
case kDLCPU:
device_name = "CPU";
break;
case kDLGPU:
device_name = "GPU";
break;
default:
device_name = "Unknown device";
}
return os << device_name << ":" << ctx.device_id;
} }
namespace dgl { namespace dgl {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment