• xiang song(charlie.song)'s avatar
    [Feature] x_dot_x builtin kernel support (#831) · 0a56d652
    xiang song(charlie.song) authored
    * upd
    
    * fig edgebatch edges
    
    * add test
    
    * trigger
    
    * Update README.md for pytorch PinSage example.
    
    Add noting that the PinSage model example under
    example/pytorch/recommendation only work with Python 3.6+
    as its dataset loader depends on stanfordnlp package
    which work only with Python 3.6+.
    
    * Provid a frame agnostic API to test nn modules on both CPU and CUDA side.
    
    1. make dgl.nn.xxx frame agnostic
    2. make test.backend include dgl.nn modules
    3. modify test_edge_softmax of test/mxnet/test_nn.py and
        test/pytorch/test_nn.py work on both CPU and GPU
    
    * Fix style
    
    * Delete unused code
    
    * Make agnostic test only related to tests/backend
    
    1. clear all agnostic related code in dgl.nn
    2. make test_graph_conv agnostic to cpu/gpu
    
    * Fix code style
    
    * fix
    
    * doc
    
    * Make all test code under tests.mxnet/pytorch.test_nn.py
    work on both CPU and GPU.
    
    * Fix syntex
    
    * Remove rand
    
    * Start implementing masked-mm kernel.
    
    Add base control flow code.
    
    * Add masked dot declare
    
    * Update func/variable name
    
    * Skeleton compile OK
    
    * Update Implement. Unify BinaryDot with BinaryReduce
    
    * New Impl of x_dot_x, reuse binary reduce template
    
    * Compile OK.
    
    TODO:
    1. make sure x_add_x, x_sub_x, x_mul_x, x_div_x work
    2. let x_dot_x work
    3. make sure backward of x_add_x, x_sub_x, x_mul_x, x_div_x work
    4. let x_dot_x backward work
    
    * Fix code style
    
    * Now we can pass the tests/compute/test_kernel.py for add/sub/mul/div forward and backward
    
    * Fix mxnet test code
    
    * Add u_dot_v, u_dot_e, v_dot_e unitest.
    
    * Update doc
    
    * Now also support v_dot_u, e_dot_u, e_dot_v
    
    * Add unroll for some loop
    
    * Add some Opt for cuda backward of dot builtin.
    
    Backward is still slow for dot
    
    * Apply UnravelRavel opt for broadcast backward
    
    * update docstring
    0a56d652
binary_reduce_impl.h 8.58 KB