issue/127: refactor elementwise framework, complete CUDA implementation, refactor swiglu using the generic elementwise framework