@unittest.skip("Skip this test because this feature has a bug. See comments below.")
deftest_stateful_custom_logit_processor(self):
"""Test custom logit processor with a single request."""
"""
NOTE: This feature has a race condition bug.
This line https://github.com/sgl-project/sglang/blob/ef8ec07b2ce4c70c2a33ec5acda4ce529bc3cda4/test/srt/test_srt_endpoint.py#L395-L396 can be accessed by two concurrent threads at the same time. The access order is not guaranteed.
In sglang, we use two python threads to overlap the GPU computation and CPU scheduling.
Thread 1 (the CPU scheduling thread) will update the `param_dict["__req__"].output_ids`.
Thread 2 (the GPU computation thread) will call `DeterministicStatefulLogitProcessor` because sampling is considered as GPU computation.
We can fix this by moving the call of DeterministicStatefulLogitProcessor to the CPU scheduling thread.