[Carver] Remove legacy todo items in carver's readme (#74) (a26a4315) · Commits · OpenDAS / tilelang

Commit a26a4315 authored Feb 11, 2025 by

Lei Wang Committed by GitHub Feb 11, 2025

[Carver] Remove legacy todo items in carver's readme (#74)

* [Enhancement] Add VectorizeLoop function and update imports for compatibility

* [CI][Test] Improve test cases for vectorization and fix typos in parser comments

* lint fix

* Fix incorrect module reference for VectorizeLoop transformation

* Refactor vectorize_loop transformation by removing unused extent mutation logic

* [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen

* Fix formatting in CUDA FP8 header file for consistency

* Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity

* Update submodule 'tvm' to latest commit for improved functionality

* Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.

* Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.

* Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency

* Add CUDA requirements to FP8 test cases and update references for clarity

* Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py

* Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py

* Add CUDA requirements and FP8 test cases for matmul and gemv simulations

* Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py

* Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py

* Add BF16 support to matrix multiplication and introduce corresponding test cases

* Add a blank line for improved readability in BF16 GEMM test

* Update acknowledgements in README to include supervision by Zhi Yang at Peking University

* enhance acknowledgement

* Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives

* Update subproject commit for TVM dependency

* Update subproject commit for TVM dependency

* Add int4_t type and functions for packing char values in CUDA common header

* Add plot_layout example and implement GetForwardVars method in layout classes

* Refactor code for improved readability by adjusting line breaks and formatting in layout and test files

* Fix formatting by removing unnecessary line break in layout.h

* Refactor make_int4 function for improved readability by adjusting parameter formatting

* Add legend to plot_layout for improved clarity of thread and local IDs

* Remove unnecessary dependencies from requirements files for cleaner setup

* Remove flash_mha.py and add .gitkeep to deepseek_mla directory

* Add build requirements and update installation scripts for improved setup

* Introduce carver

* Refactor imports and improve code formatting for consistency

* Add unit tests for carver recommendation hints

* lint fix

* Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity

* Refactor import statements and clean up whitespace in template files for improved readability

* Add README.md for Carver framework with usage examples and architecture support

* Refactor import statement in matmul_analysis.py for consistency

* Refactor TileDict and TensorCorePolicy methods for improved clarity and functionality

* Add tests for general matrix multiplication emit configurations

* Refactor formatting in test_tilelang_carver_generate_hints.py for improved readability

* Add FlashAttentionTemplate and related functionality for hint recommendations

* Refactor whitespace in FlashAttentionTemplate and test_tilelang_carver_recommend_hints for improved readability

* Update README.md to include FlashAttentionTemplate in the carver section

parent 1ef644e7

Hide whitespace changes

Inline Side-by-side

View file @ a26a4315

...	@@ -196,6 +196,7 @@ This helps quickly test multiple configurations without manually guessing.	...	@@ -196,6 +196,7 @@ This helps quickly test multiple configurations without manually guessing.

	Carver abstracts common loop patterns through templates:		Carver abstracts common loop patterns through templates:
	- `GeneralReductionTemplate`: For general `Spatial-Spatial-Reduce` (SSR) structures or similar.		- `GeneralReductionTemplate`: For general `Spatial-Spatial-Reduce` (SSR) structures or similar.
			- `FlashAttentionTemplate`: For attention-like operations with flash memory.
	- `MatmulTemplate`: For standard matrix multiplication `C = A * B`.		- `MatmulTemplate`: For standard matrix multiplication `C = A * B`.
	- `GEMVTemplate`: For `y = Ax` or `y = xA` style operations.		- `GEMVTemplate`: For `y = Ax` or `y = xA` style operations.
	- `ElementwiseTemplate`: For elementwise transformations or pointwise ops.		- `ElementwiseTemplate`: For elementwise transformations or pointwise ops.
...	@@ -205,6 +206,5 @@ You can also create your own specialized templates if you have unique loop struc	...	@@ -205,6 +206,5 @@ You can also create your own specialized templates if you have unique loop struc

	## TODO Items		## TODO Items

	- [ ] Flash Attention and its variants: Support search-space generation for specialized attention kernels.
	- [ ] Adapt to tile language: Provide ready-made scheduling calls or wrappers for [tilelang](https://github.com/LeiYanggh/tilelang) to streamline end-to-end integration.		- [ ] Adapt to tile language: Provide ready-made scheduling calls or wrappers for [tilelang](https://github.com/LeiYanggh/tilelang) to streamline end-to-end integration.

Please register or to comment