perf(inference): optimize batch processing for different GPU memory sizes

- Set NPUDTCompile to false for better performance on NPU - Adjust batch ratio

perf(inference): optimize batch processing for different GPU memory sizes
- Set NPUDTCompile to false for better performance on NPU - Adjust batch ratio
6116488d · myhloli · 7a856804 · 6116488d
Commit 6116488d authored Mar 11, 2025 by myhloli
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

magic_pdf/model/doc_analyze_by_custom_model.py magic_pdf/model/doc_analyze_by_custom_model.py +4 -2

No files found.
--- a/magic_pdf/model/doc_analyze_by_custom_model.py
+++ b/magic_pdf/model/doc_analyze_by_custom_model.py
@@ -165,12 +165,14 @@ def doc_analyze(
        import torch_npu
        if torch_npu.npu.is_available():
            npu_support = True
+            torch.npu.set_compile_mode(jit_compile=False)
    if torch.cuda.is_available() and device != 'cpu' or npu_support:
        gpu_memory = int(os.getenv("VIRTUAL_VRAM_SIZE", round(get_vram(device))))
        if gpu_memory is not None and gpu_memory >= 8:
+            if gpu_memory >= 20:
-            if gpu_memory >= 16:
+                batch_ratio = 16
+            elif gpu_memory >= 15:
                batch_ratio = 8
            elif gpu_memory >= 10:
                batch_ratio = 4