• Jesse Gross's avatar
    llamarunner: Record the time for all batches during prompt processing · a8d9c264
    Jesse Gross authored
    Currently, we only record the time for the last batch when processing
    the prompt. This results in unrealistically high numbers for the
    old llama runner.
    
    Before:
    total duration:       31.273112939s
    load duration:        4.97054657s
    prompt eval count:    32768 token(s)
    prompt eval duration: 235.137439ms
    prompt eval rate:     139356.80 tokens/s
    eval count:           1873 token(s)
    eval duration:        18.173182374s
    eval rate:            103.06 tokens/s
    
    After:
    total duration:       30.024798033s
    load duration:        4.758588663s
    prompt eval count:    32768 token(s)
    prompt eval duration: 7.779621548s
    prompt eval rate:     4212.03 tokens/s
    eval count:           1769 token(s)
    eval duration:        17.148014223s
    eval rate:            103.16 tokens/s
    a8d9c264
runner.go 23.5 KB