• Anton Gorenko's avatar
    Improve latencies, handling of streams and events, multi-GPU support · 70771a51
    Anton Gorenko authored
    Use a small kernel for copying interactionCounts to host memory
    
        hipMemcpy's CopyDeviceToHost operation has higher latency.
    
    Do not set stream and event blocking/spin related flags
    
        Let the runtime choose the best option because overriding does not
        improve performance in most cases.
    
    Remove NULL streams and use nonblocking streams explicitly
    
    Make HipContext::pushAsCurrent/popAsCurrent thread-safe as they can be
    called simultaneously from different threads via ContextSelector.
    
    Allow peer access to be enabled more than once (if there are multiple
    simulations one after another, like in benchmark.py).
    
    Create peerCopyStream on a corresponding device
    
    Use two-speed load balancing for multi GPU runs
    
        First 100 steps do coarse balancing, next 100 - fine tuning.
        Also ignore the slowest device (usually 0) if its fraction has
        reached 0, (i.e. no work can be transfered to other devices) and
        balance other devices.
    
    Do not download inteactionCounts in parallel nonbonded tasks
    
        This is not required because updateNeighborListSize has been called
        and valid flag changed.
    
    Initialize tilesAfterReorder properly
    
        It may contain a garbage value, and if it is large then
        updateNeighborListSize does not force reorder atoms after 25 steps
        in extremal cases.
    70771a51
HipArray.cpp 6.38 KB