Add Layer Normalization (#2213)
* wip: layer normalization on cpu * wip: add cuda implementation, nor working yet * wip: try to fix cuda implementation * swap grid_strid_range and grid_strid_range_y: does not work yet * fix CUDA implementation * implement cuda gradient * add documentation, move layer_norm, update bn_visitor * add tests * use stddev instead of variance in test (they are both 1, anyway) * add test for means and invstds on CPU and CUDA * rename visitor to disable_duplicative_bias * handle more cases in the visitor_disable_input_bias * Add tests for visitor_disable_input_bias
Showing
Please register or sign in to comment