1. 25 Mar, 2025 1 commit
    • Matthew Douglas's avatar
      PyTorch Custom Operator Integration (#1544) · e82f72b3
      Matthew Douglas authored
      
      
      * Sketch out first custom op registration
      
      * Add note
      
      * Initial int8 op registration
      
      * Cleanup some deprecated functions.
      
      * Int8 ops updates; tests
      
      * Implement 4bit quant/dequant ops
      
      * Fix nested quant
      
      * cleanup
      
      * Test improvements
      
      * Clean up and improve tests
      
      * Add higher level custom op for int8 matmul + dequant + bias
      
      * Add gemv 4bit custom op
      
      * Cleanup
      
      * Implement out kwarg overloads for custom ops
      
      * Update PyTorch minimum to 2.1
      
      * Deprecation updates
      
      * Deprecation updates
      
      * Cleanup; rename int8_linear_dequant -> int8_scaled_mm
      
      * Bump min pytorch to 2.2
      
      * cleanup
      
      * Test reorganization
      
      * Remove deprecated supports_igemmlt
      
      * More cleanup
      
      * Cleanup obsolete C++/CUDA code
      
      * Cleanup
      
      * Create 'default' backend for fallback op implementations; initial CPU nf4 work
      
      * Stub out for multi-platform
      
      * Fix serialization tests for torch>=2.6.0
      
      * Add example for torch.compile e2e inference
      
      * Test update
      
      ---------
      Co-authored-by: default avatarTitus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
      e82f72b3
  2. 06 Mar, 2024 1 commit
  3. 21 Feb, 2024 1 commit
  4. 01 Feb, 2024 1 commit
    • Aarni Koskela's avatar
      Test improvements (#1001) · 2336a45c
      Aarni Koskela authored
      * test_nvidia_transform: fix variable reference
      
      `out_order` is the global parametrization list, not the test fixture argument
      
      * Make `parametrize` use more idiomatic
      
      * Use a more deterministic helper for `dim*` determination
      
      * Convert NO_CUBLASLT errors into skips too
      
      * Mark slow and benchmark tests as such (allows `-k "not benchmark"`)
      2336a45c