attempt to load dependent cpu/cuda submodules before resorting to JIT add pyproject.toml to facilitate package install-time dependency on pytorch and ninja add setup.py
add cpu-gpu dispatcher
refactor variable names