1. StdOrchestrator: Tokenizes the requests and sends them to the scheduler.
1. TokenizerManager: Tokenizes the requests and sends them to the scheduler.
2. Scheduler (subprocess): Receives requests from the Tokenizer Manager, schedules batches, forwards them, and sends the output tokens to the Detokenizer Manager.
2. Scheduler (subprocess): Receives requests from the Tokenizer Manager, schedules batches, forwards them, and sends the output tokens to the Detokenizer Manager.
3. DetokenizerManager (subprocess): Detokenizes the output tokens and sends the result back to the Tokenizer Manager.
3. DetokenizerManager (subprocess): Detokenizes the output tokens and sends the result back to the Tokenizer Manager.
Note:
Note:
1. The HTTP server, Engine, and StdOrchestrator both run in the main process.
1. The HTTP server, Engine, and TokenizerManager both run in the main process.
2. Inter-process communication is done through ICP (each process uses a different port) via the ZMQ library.
2. Inter-process communication is done through ICP (each process uses a different port) via the ZMQ library.
- HTTP server: A FastAPI server that routes requests to the engine.
- HTTP server: A FastAPI server that routes requests to the engine.
- The engine consists of three components:
- The engine consists of three components:
1. StdOrchestrator: Tokenizes the requests and sends them to the scheduler.
1. TokenizerManager: Tokenizes the requests and sends them to the scheduler.
2. Scheduler (subprocess): Receives requests from the Tokenizer Manager, schedules batches, forwards them, and sends the output tokens to the Detokenizer Manager.
2. Scheduler (subprocess): Receives requests from the Tokenizer Manager, schedules batches, forwards them, and sends the output tokens to the Detokenizer Manager.
3. DetokenizerManager (subprocess): Detokenizes the output tokens and sends the result back to the Tokenizer Manager.
3. DetokenizerManager (subprocess): Detokenizes the output tokens and sends the result back to the Tokenizer Manager.
Note:
Note:
1. The HTTP server, Engine, and StdOrchestrator both run in the main process.
1. The HTTP server, Engine, and TokenizerManager both run in the main process.
2. Inter-process communication is done through ICP (each process uses a different port) via the ZMQ library.
2. Inter-process communication is done through ICP (each process uses a different port) via the ZMQ library.