• Michael Yang's avatar
    llamarunner: update metrics · bbbc73d6
    Michael Yang authored
    this change updates how metrics are collected. until now, performance
    metrics, specifically initial input processing and subsequent generation
    durations, were collected by taking the timestamp when creating a new
    sequence, the first token generation, and completing generation. the
    processing duration is taken as first token generation sub sequence
    creation while generation is taken as completing generation sub first
    token generation.
    
    while this approach is an accurate end-to-end metric of processing and
    generation, it's not comparable to other tools which only measure the
    active, i.e. decode, duration.
    
    this change updates the metrics to only capture decode duration so it
    can be more directly compared to other tools
    bbbc73d6
runner.go 22.4 KB