• Nicolas Patry's avatar
    Rebase TRT-llm (#2331) · 2b19d671
    Nicolas Patry authored
    * wip
    
    wip
    
    refacto
    
    refacto
    
    Initial setup for CXX binding to TRTLLM
    
    Working FFI call for TGI and TRTLLM backend
    
    Remove unused parameters annd force tokenizer name to be set
    
    Overall build TRTLLM and deps through CMake build system
    
    Enable end to end CMake build
    
    First version loading engines and making it ready for inference
    
    Remembering to check how we can detect support for chunked context
    
    Move to latest TensorRT-LLM version
    
    Specify which default log level to use depending on CMake build type
    
    make leader executor mode working
    
    unconditionally call InitializeBackend on the FFI layer
    
    bind to CUDA::nvml to retrieve compute capabilities at runtime
    
    updated logic and comment to detect cuda compute capabilities
    
    implement the Stream method to send new tokens through a callback
    
    use spdlog release 1.14.1 moving forward
    
    update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c
    
    correctly tell cmake to build dependent tensorrt...
    2b19d671
Dockerfile 9.49 KB