Unverified Commit e9440acb authored by Jinjing Zhou's avatar Jinjing Zhou Committed by GitHub
Browse files

[TF] TF backend fix and new logic to choose backend (#1393)



* TF backend fix and new logic to choose backend

* fix

* fix

* fix

* fix

* fix backend

* fix

* dlpack alignment

* add flag

* flag

* lint

* lint

* remove unused

* several fixes
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
parent 4b4186f8
......@@ -152,7 +152,7 @@ DGL should work on
DGL requires Python 3.5 or later.
Right now, DGL works on [PyTorch](https://pytorch.org) 1.1.0+, [MXNet](https://mxnet.apache.org) nightly build, and [TensorFlow](https://tensorflow.org) 2.0+.
Right now, DGL works on [PyTorch](https://pytorch.org) 1.2.0+, [MXNet](https://mxnet.apache.org) 1.5.1+, and [TensorFlow](https://tensorflow.org) 2.1.0+.
### Using anaconda
......
......@@ -5,12 +5,12 @@ dependencies:
- pip:
- mxnet
- pytest
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
......@@ -5,12 +5,12 @@ dependencies:
- pip:
- mxnet-cu101
- pytest
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
......@@ -3,15 +3,16 @@ dependencies:
- python=3.6.9
- pip
- pip:
- tensorflow==2.1.0rc1
- tensorflow==2.2.0rc1
# - tf-nightly==2.2.0.dev20200327
- tfdlpack
- pytest
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
name: tensorflow-ci
dependencies:
- python=3.6.9
- pip
- pip:
- tensorflow-gpu==2.1.0rc1
- tensorflow==2.2.0rc1
# - tf-nightly==2.2.0.dev20200327
- tfdlpack-gpu
- pytest
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
......@@ -6,12 +6,12 @@ dependencies:
- torch
- torchvision
- pytest
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
\ No newline at end of file
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
\ No newline at end of file
......@@ -6,12 +6,12 @@ dependencies:
- torch
- torchvision
- pytest
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
\ No newline at end of file
- nose
- numpy
- cython
- scipy
- networkx
- matplotlib
- nltk
- requests[security]
- tqdm
\ No newline at end of file
......@@ -3,14 +3,21 @@
Working with different backends
===============================
DGL supports PyTorch, MXNet and Tensorflow backends. To change them, set the ``DGLBACKEND``
environcment variable. The default backend is PyTorch.
DGL supports PyTorch, MXNet and Tensorflow backends.
DGL will choose the backend on the following options (high priority to low priority)
- `DGLBACKEND` environment
- You can use `DGLBACKEND=[BACKEND] python gcn.py ...` to specify the backend
- Or `export DGLBACKEND=[BACKEND]` to set the global environment variable
- `config.json` file under "~/.dgl"
- You can use `python -m dgl.backend.set_default_backend [BACKEND]` to set the default backend
Currently BACKEND can be chosen from mxnet, pytorch, tensorflow.
PyTorch backend
---------------
Export ``DGLBACKEND`` as ``pytorch`` to specify PyTorch backend. The required PyTorch
version is 0.4.1 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
version is 1.1.0 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
MXNet backend
-------------
......@@ -32,18 +39,10 @@ Tensorflow backend
------------------
Export ``DGLBACKEND`` as ``tensorflow`` to specify Tensorflow backend. The required Tensorflow
version is 2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
instructions. In addition, Tensorflow backend requires ``tfdlpack`` package installed as follows and set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
.. code:: bash
pip install tfdlpack # when using tensorflow cpu version
or
version is 2.2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
instructions. In addition, DGL will set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
.. code:: bash
pip install tfdlpack-gpu # when using tensorflow gpu version
export TF_FORCE_GPU_ALLOW_GROWTH=true # and add this to your .bashrc/.zshrc file if needed
pip install "tensorflow>=2.2.0rc1" # when using tensorflow cpu version
......@@ -474,8 +474,8 @@ DGL_DLL int DGLArrayFromDLPack(DLManagedTensor* from,
* \param out The DLManagedTensor handle.
* \return 0 when success, -1 when failure happens
*/
DGL_DLL int DGLArrayToDLPack(DGLArrayHandle from,
DLManagedTensor** out);
DGL_DLL int DGLArrayToDLPack(DGLArrayHandle from, DLManagedTensor** out,
int alignment = 0);
/*!
* \brief Delete (free) a DLManagedTensor's data.
......
......@@ -5,7 +5,7 @@ import socket
# Need to ensure that the backend framework is imported before load dgl libs,
# otherwise weird cuda problem happens
from .backend import load_backend
from .backend import load_backend, backend_name
from . import function
from . import contrib
......
......@@ -73,15 +73,23 @@ class NDArrayBase(object):
def _dgl_handle(self):
return ctypes.cast(self.handle, ctypes.c_void_p).value
def to_dlpack(self):
def to_dlpack(self, alignment=0):
"""Produce an array from a DLPack Tensor without copying memory
Args
-------
alignment: int, default to be 0
Indicates the alignment requirement when converting to dlpack. Will copy to a
new tensor if the alignment requirement is not satisfied.
0 means no alignment requirement.
Returns
-------
dlpack : DLPack tensor view of the array data
"""
ptr = ctypes.c_void_p()
check_call(_LIB.DGLArrayToDLPack(self.handle, ctypes.byref(ptr)))
check_call(_LIB.DGLArrayToDLPack(self.handle, ctypes.byref(ptr), alignment))
return ctypes.pythonapi.PyCapsule_New(ptr, _c_str_dltensor, _c_dlpack_deleter)
......
......@@ -112,7 +112,8 @@ cdef extern from "dgl/runtime/c_runtime_api.h":
int DGLArrayFromDLPack(DLManagedTensor* arr_from,
DLTensorHandle* out)
int DGLArrayToDLPack(DLTensorHandle arr_from,
DLManagedTensor** out)
DLManagedTensor** out,
int alignment)
void DGLDLManagedTensorCallDeleter(DLManagedTensor* dltensor)
cdef extern from "dgl/runtime/c_object_api.h":
......
......@@ -59,9 +59,16 @@ cdef class NDArrayBase:
if self.c_is_view == 0:
CALL(DGLArrayFree(self.chandle))
def to_dlpack(self):
def to_dlpack(self, alignment=0):
"""Produce an array from a DLPack Tensor without copying memory
Args
-------
alignment: int, default to be 0
Indicates the alignment requirement when converting to dlpack. Will copy to a
new tensor if the alignment requirement is not satisfied.
0 means no alignment requirement.
Returns
-------
dlpack : DLPack tensor view of the array data
......@@ -69,7 +76,7 @@ cdef class NDArrayBase:
cdef DLManagedTensor* dltensor
if self.c_is_view != 0:
raise ValueError("to_dlpack do not work with memory views")
CALL(DGLArrayToDLPack(self.chandle, &dltensor))
CALL(DGLArrayToDLPack(self.chandle, &dltensor, alignment))
return pycapsule.PyCapsule_New(dltensor, _c_str_dltensor, _c_dlpack_deleter)
......
from __future__ import absolute_import
import sys, os
import sys
import os
import json
import importlib
from . import backend
from .set_default_backend import set_default_backend
_enabled_apis = set()
def _gen_missing_api(api, mod_name):
def _missing_api(*args, **kwargs):
raise ImportError('API "%s" is not supported by backend "%s".'
......@@ -14,6 +18,7 @@ def _gen_missing_api(api, mod_name):
' the DGLBACKEND environment.' % (api, mod_name))
return _missing_api
def load_backend(mod_name):
mod = importlib.import_module('.%s' % mod_name, __name__)
thismod = sys.modules[__name__]
......@@ -45,7 +50,29 @@ def load_backend(mod_name):
else:
setattr(thismod, api, _gen_missing_api(api, mod_name))
load_backend(os.environ.get('DGLBACKEND', 'pytorch').lower())
def get_preferred_backend():
config_path = os.path.join(os.path.expanduser('~'), '.dgl', 'config.json')
backend_name = None
if "DGLBACKEND" in os.environ:
backend_name = os.getenv('DGLBACKEND')
elif os.path.exists(config_path):
with open(config_path, "r") as config_file:
config_dict = json.load(config_file)
backend_name = config_dict.get('backend', '').lower()
if (backend_name in ['tensorflow', 'mxnet', 'pytorch']):
return backend_name
else:
while not(backend_name in ['tensorflow', 'mxnet', 'pytorch']):
print("DGL does not detect a valid backend option. Which backend would you like to work with?")
backend_name = input("Backend choice (pytorch, mxnet or tensorflow): ").lower()
set_default_backend(backend_name)
return backend_name
load_backend(get_preferred_backend())
def is_enabled(api):
"""Return true if the api is enabled by the current backend.
......
......@@ -14,7 +14,7 @@ from ...function.base import TargetCode
MX_VERSION = LooseVersion(mx.__version__)
if MX_VERSION.version[0] == 1 and MX_VERSION.version[1] < 5:
raise Exception("DGL has to work with MXNet version >= 1.5")
raise RuntimeError("DGL requires mxnet >= 1.5")
# After MXNet 1.5, empty tensors aren't supprted by default.
# After we turn on the numpy compatible flag, MXNet supports empty NDArray.
......
......@@ -2,6 +2,7 @@ from __future__ import absolute_import
from distutils.version import LooseVersion
import scipy # Weird bug in new pytorch when import scipy after import torch
import torch as th
import builtins
from torch.utils import dlpack
......@@ -9,8 +10,11 @@ from torch.utils import dlpack
from ... import ndarray as nd
from ... import kernel as K
from ...function.base import TargetCode
from ...base import dgl_warning
TH_VERSION = LooseVersion(th.__version__)
if LooseVersion(th.__version__) < LooseVersion("1.2.0"):
dgl_warning("Detected an old version of PyTorch. Suggest using torch>=1.2.0 "
"for the best experience.")
def data_type_dict():
return {'float16' : th.float16,
......
import argparse
import os
import json
def set_default_backend(backend_name):
default_dir = os.path.join(os.path.expanduser('~'), '.dgl')
if not os.path.exists(default_dir):
os.makedirs(default_dir)
config_path = os.path.join(default_dir, 'config.json')
with open(config_path, "w") as config_file:
json.dump({'backend': backend_name.lower()}, config_file)
print('Set the default backend to "{}". You can change it in the '
'~/.dgl/config.json file or export the DGLBACKEND environment variable.'.format(
backend_name))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("backend", nargs=1, type=str, choices=[
'pytorch', 'tensorflow', 'mxnet'], help="Set default backend")
args = parser.parse_args()
set_default_backend(args.backend[0])
import os
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
from .tensor import *
"""Tensorflow backend implementation"""
from __future__ import absolute_import
from distutils.version import LooseVersion
......@@ -6,16 +6,41 @@ from distutils.version import LooseVersion
import tensorflow as tf
from tensorflow.python.eager import context
import builtins
import tfdlpack
import numpy as np
from tfdlpack import to_dlpack, from_dlpack
import os
from ... import ndarray as nd
from ... import kernel as K
from ...function.base import TargetCode
TF_VERSION = LooseVersion(tf.__version__)
if os.getenv("USE_OFFICIAL_TFDLPACK", False):
if LooseVersion(tf.__version__) < LooseVersion("2.2.0"):
raise RuntimeError("DGL requires tensorflow>=2.2.0 for the official DLPack support.")
def zerocopy_to_dlpack(input):
return tf.experimental.dlpack.to_dlpack(input)
def zerocopy_from_dlpack(dlpack_tensor):
# TODO(Jinjing): Tensorflow requires memory to be 64-bit aligned. We check the
# alignment and make a copy if needed. The functionality is better in TF's main repo.
aligned = nd.from_dlpack(dlpack_tensor).to_dlpack(64)
return tf.experimental.dlpack.from_dlpack(aligned)
else:
# Use our own DLPack solution
try:
import tfdlpack
except ImportError:
raise ImportError('Cannot find tfdlpack, which is required by the Tensorflow backend. '
'Please follow https://github.com/VoVAllen/tf-dlpack for installation.')
if LooseVersion(tf.__version__) < LooseVersion("2.1.0"):
raise RuntimeError("DGL requires tensorflow>=2.1.0.")
def zerocopy_to_dlpack(input):
return tfdlpack.to_dlpack(input)
def zerocopy_from_dlpack(input):
return tfdlpack.from_dlpack(input)
def data_type_dict():
return {'float16': tf.float16,
......@@ -27,11 +52,9 @@ def data_type_dict():
'int32': tf.int32,
'int64': tf.int64}
def cpu():
return "/cpu:0"
def tensor(data, dtype=None):
return tf.convert_to_tensor(data, dtype=dtype)
......@@ -355,16 +378,7 @@ def rand_shuffle(arr):
return tf.random.shuffle(arr)
def zerocopy_to_dlpack(input):
return tfdlpack.to_dlpack(input)
def zerocopy_from_dlpack(dlpack_tensor):
return tfdlpack.from_dlpack(dlpack_tensor)
def zerocopy_to_numpy(input):
# NOTE: not zerocopy
return np.asarray(memoryview(input))
......
......@@ -13,12 +13,24 @@
#include <dgl/graph_interface.h>
#include <algorithm>
#include <vector>
#include <string>
using dgl::runtime::operator<<;
/*! \brief Output the string representation of device context.*/
inline std::ostream& operator << (std::ostream& os, const DLContext& ctx) {
return os << ctx.device_type << ":" << ctx.device_id;
inline std::ostream& operator<<(std::ostream& os, const DLContext& ctx) {
std::string device_name;
switch (ctx.device_type) {
case kDLCPU:
device_name = "CPU";
break;
case kDLGPU:
device_name = "GPU";
break;
default:
device_name = "Unknown device";
}
return os << device_name << ":" << ctx.device_id;
}
namespace dgl {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment