* make it simple * batched gemm+softmax+gemm
* removing program server * specify launch bound per kernel instance