"vscode:/vscode.git/clone" did not exist on "518369d78c1ec9ffef308131366e4bda745b5573"
  • Graham King's avatar
    feat: Initial Granite support (#1271) · 7d0c9386
    Graham King authored
    - Add Granite to our tokenizer
    - Fix pre-processor to load context length correctly
    - Add strftime_now Jinja function for prompt templates
    - Update llama.cpp
    - Handle trtllm errors when not using trtllm
    
    Support depends on the engine:
    
    - `mistral.rs`, our default engine, doesn't support Granite yet.
    
    - `llama.cpp` does and works very well:
    ```
    dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
    ```
    
    - `vllm` also works very well:
    ```
    dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
    ```
    
    - `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.
    
    Closes: #1245 
    7d0c9386
tokcfg.rs 8.12 KB