• Daniel Hiltgen's avatar
    New engine: vision models and auto-fallback (#9113) · 1fdb351c
    Daniel Hiltgen authored
    * Include unified vision layers in memory prediction
    
    For newer vision models with a single gguf, include
    the projection estimates.
    
    * Adjust CLI to handle both styles of vision model metadata
    
    * Wire up new tokenizers for new engine
    
    If we're loading the new engine, utilize the new model
    text processor instead of calling into cgo wrappers for
    llama.cpp.  This also cleans up some tech debt from the
    older tokenization flow for the C++ server which was
    no longer used.
    
    This also adjusts the grammar handling logic to pass
    through to the new engine instead of utilizing the cgo
    schema to grammar call.
    
    * Lay foundation for auto selection of new engine
    1fdb351c
routes.go 40.7 KB