• Robert Kimball's avatar
    feat(router): add header option to disable buffering for the generate_stream response (#498) · 70f485bf
    Robert Kimball authored
    # This PR adds an http header option to disable buffering for the
    generate_stream endpoint response stream.
    
    Problem: If a model is run behind a proxy server such as nginx that has
    buffering enabled then the response stream from generate_stream gets
    aggregated into a single response which basically disables streaming.
    Instead of getting a chunked response where each token is presented over
    time the response presents everything all at once.
    
    Solution: This change adds the `X-Accel-Buffering` http header which
    disables buffering for the generate_stream response, allowing the
    response to stream properly.
    70f485bf
server.rs 30.2 KB