The reasoning parser layer provides a unified interface for detecting and extracting reasoning content from Large Language Model (LLM) outputs, particularly from models that support Chain-of-Thought (CoT) reasoning with explicit thinking blocks. The architecture follows a trait-based design pattern enabling pluggable parser implementations while maintaining consistent APIs across different model families that use various reasoning token formats.
**Key Components:**
-**Factory Pattern**: Registry-based creation and pooling of model-specific parsers
-**Trait System**: `ReasoningParser` trait for implementation flexibility
-**Parser Pooling**: Efficient reuse of parser instances across concurrent requests
-**Streaming Support**: Incremental parsing with partial token buffering
-**Model Detection**: Pattern-based matching for automatic parser selection
-**State Management**: Stateful parsing for streaming scenarios with buffer management
-**Thread Safety**: Arc<Mutex> based sharing for high-concurrency environments
-**Extensibility**: Easy addition of new model-specific parsers
**Data Flow:**
1. Request → Factory (model detection) → Pooled Parser Retrieval