Enforce memory alignment to improve performance of vector operations. Also fixed bugs in an earlier optimization.
Optimizations to CPU nonbonded forces: better load balancing between threads, use linear splines instead of cubic