------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 400000000 (elements), Offset = 0 (elements) Memory per array = 3051.8 MiB (= 3.0 GiB). Total memory required = 9155.3 MiB (= 8.9 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 32 Number of Threads counted = 32 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 19105 microseconds. (= 19105 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 342008.3 0.018755 0.018713 0.018895 Scale: 342409.6 0.018737 0.018691 0.018802 Add: 343827.7 0.028050 0.027921 0.028269 Triad: 363208.7 0.026599 0.026431 0.026855 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------