The `sgl-router` tokenizer subsystem exposes a single `Tokenizer` facade around multiple backends
### High-Level Overview
(Hugging Face JSON tokenizers, OpenAI/tiktoken models, and an in-memory mock). It packages the
shared behaviours needed by the router–encoding user text, incrementally decoding streamed tokens,
The SGL Router tokenizer layer provides a unified interface for text tokenization and detokenization, supporting multiple tokenizer backends (HuggingFace, Tiktoken, Mock) with sophisticated streaming capabilities and stop sequence detection. The architecture follows a trait-based design pattern enabling pluggable tokenizer implementations while maintaining consistent APIs across the router.
tracking per-request state, and detecting stop conditions—behind trait objects so the rest of the
router can remain backend-agnostic.
**Key Components:**
-**Factory Pattern**: Auto-detection and creation of appropriate tokenizer types from files or model names
Key capabilities:
-**HuggingFace Hub Integration**: Automatic downloading of tokenizer files from HuggingFace Hub for model IDs
- trait-based split between `Encoder`, `Decoder`, and `Tokenizer` for shared APIs across backends
-**Trait System**: `Encoder`, `Decoder`, and `Tokenizer` traits for implementation flexibility
- Hugging Face tokenizer loading (with optional chat templates) and HF Hub downloads
-**Streaming**: Incremental decoding with UTF-8 boundary handling and buffering
- heuristic selection of OpenAI/tiktoken encodings for GPT model names
-**Stop Sequences**: Complex pattern matching for stop tokens and sequences with "jail" buffering
- incremental decoding utilities (`DecodeStream`, `Sequence`) that handle UTF-8 boundaries
-**Sequence Management**: Stateful token sequence tracking with incremental text generation
- stop sequence handling via `StopSequenceDecoder` with token-level and string-level triggers
-**Chat Templates**: Jinja2-based conversation formatting with HuggingFace compatibility
- optional Jinja2 chat-template rendering that matches Hugging Face semantics
-**Metrics Integration**: Comprehensive performance and error tracking across all operations
The implementation deliberately keeps the surface area small—metrics, batching, or SentencePiece
**Data Flow:**
support mentioned in earlier drafts do **not** exist today. This document reflects the actual code