💡 Refer to the [Model Structure Documentation](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/model_structure.html) to quickly get started with LightX2V
This document provides a comprehensive introduction to the model directory structure of the LightX2V project, designed to help users efficiently organize model files and achieve a convenient user experience. Through scientific directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. Meanwhile, the system also supports flexible manual path configuration to meet the diverse needs of different user groups.
LightX2V is a flexible video generation inference framework that supports multiple model sources and formats, providing users with rich options:
## 🗂️ Model Directory Structure
- ✅ **Wan Official Models**: Directly compatible with officially released complete models from Wan2.1 and Wan2.2
- ✅ **Single-File Models**: Supports single-file format models released by LightX2V (including quantized versions)
- ✅ **LoRA Models**: Supports loading distilled LoRAs released by LightX2V
### LightX2V Official Model List
This document provides detailed instructions on how to use various model formats, configuration parameters, and best practices.
View all available models: [LightX2V Official Model Repository](https://huggingface.co/lightx2v)
---
### Standard Directory Structure
## 🗂️ Format 1: Wan Official Models
Using `Wan2.1-I2V-14B-480P-LightX2V` as an example, the standard file structure is as follows:
│ ├── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors # Distillation model lora
```
```
### 💾 Storage Recommendations
#### Usage
**It is strongly recommended to store model files on SSD solid-state drives**, as this can significantly improve model loading speed and inference performance.
├── models_t5_umt5-xxl-enc-bf16.pth # T5 text encoder
├── Wan2.1_VAE.pth # VAE encoder/decoder
├── configuration.json # Model configuration
├── google/ # T5 tokenizer
├── assets/ # Example assets (optional)
└── examples/ # Example files (optional)
```
Before starting to download models, please ensure that Hugging Face CLI is properly installed:
#### Usage
```bash
```bash
# Install huggingface_hub
# Download model
pip install huggingface_hub
huggingface-cli download Wan-AI/Wan2.2-I2V-A14B \
--local-dir ./models/Wan2.2-I2V-A14B
# Or install huggingface-cli
# Configure launch script
pip install huggingface-cli
model_path=./models/Wan2.2-I2V-A14B
lightx2v_path=/path/to/LightX2V
# Login to Hugging Face (optional, but strongly recommended)
# Run inference
huggingface-cli login
cd LightX2V/scripts
bash wan22/run_wan22_moe_i2v.sh
```
```
### Method 1: Complete Model Download (Recommended)
### Available Model List
**Advantage**: After downloading the complete model, the system will automatically identify all component paths without manual configuration, providing a more convenient user experience
> 💡 **Quantized Model Usage**: To use quantized models, refer to the [Model Conversion Script](https://github.com/ModelTC/LightX2V/blob/main/tools/convert/readme_zh.md) for conversion, or directly use pre-converted quantized models in Format 2 below
>
> 💡 **Memory Optimization**: For devices with RTX 4090 24GB or smaller memory, it's recommended to combine quantization techniques with CPU offload features:
> - Quantization Configuration: Refer to [Quantization Documentation](../method_tutorials/quantization.md)
> - CPU Offload: Refer to [Parameter Offload Documentation](../method_tutorials/offload.md)
> - Wan2.1 Configuration: Refer to [offload config files](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
> - Wan2.2 Configuration: Refer to [wan22 config files](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) with `4090` suffix
Modify the configuration in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh):
---
-`model_path`: Set to the downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`
-`lightx2v_path`: Set to the LightX2V project root directory path
###### Scenario 2: Using Quantized Model
## 🗂️ Format 2: LightX2V Single-File Models (Recommended)
When using the complete model, if you need to enable quantization, please add the following configuration to the [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/distill/wan_i2v_distill_4step_cfg.json):
-**Distillation Acceleration**: Supports 4-step fast inference
},//DITmodelquantizationscheme
-**Tool Compatibility**: Compatible with ComfyUI and other tools
"t5_quantized":true,//EnableT5quantization
"t5_quant_scheme":"fp8",//T5quantizationmode
"clip_quantized":true,//EnableCLIPquantization
"clip_quant_scheme":"fp8"//CLIPquantizationmode
}
```
> **Important Note**: Quantization configurations for each model can be flexibly combined. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.
**Examples**:
-`wan2.1_i2v_720p_lightx2v_4step.safetensors` - 720P I2V original precision
When performing inference through the Gradio interface, simply specify the model root directory path at startup, and lightweight VAE can be flexibly selected through frontend interface buttons:
**Step 2: Configure Launch Script**
```bash
```bash
# Image-to-video inference (I2V)
# Set in launch script (point to directory containing model file)
> 💡 **Tip**: When there's only one model file in the directory, LightX2V will automatically load it.
**Advantage**: Only download the required versions (quantized or non-quantized), effectively saving storage space and download time
#### Scenario B: Download Multiple Model Files
#### 1. Selective Download
When you download multiple models with different precisions to the same directory, you need to explicitly specify which model to use in the configuration file.
```bash
**Step 1: Download Multiple Models**
# Use Hugging Face CLI to selectively download non-quantized version
> **Important Note**: When starting inference scripts or Gradio, the `model_path` parameter still needs to be specified as the complete path without the `--include` parameter. For example: `model_path=./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, not `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/distill_int8`.
```
wan2.1_i2v_720p_multi/
├── wan2.1_i2v_720p_lightx2v_4step.safetensors # Original precision
Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`, and set `lightx2v_path` to your LightX2V project path.
> **Important Note**: At this time, each model can only be specified as a quantized version. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.
> 💡 **Configuration Parameter Description**:
> - **dit_original_ckpt**: Used to specify the path to original precision models (BF16/FP32/FP16)
> - **dit_quantized_ckpt**: Used to specify the path to quantized models (FP8/INT8), must be used with `dit_quantized` and `dit_quant_scheme` parameters
###### Scenario 2: Using FP8 DIT + Original Precision T5 + Original Precision CLIP
**Step 3: Start Inference**
Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, and set `lightx2v_path` to your LightX2V project path.
```bash
cd LightX2V/scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
```
### Wan2.2 Single-File Models
#### Directory Structure Requirements
When using Wan2.2 single-file models, you need to manually create a specific directory structure:
Since only quantized weights were downloaded, you need to manually download the original precision versions of T5 and CLIP, and configure them in the configuration file's `t5_original_ckpt` and `clip_original_ckpt` as follows:
```
```json
wan2.2_models/
{
├── high_noise_model/ # High-noise model directory (required)
"mm_config":{
│ └── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors # High-noise model file
# Configure launch script (point to parent directory)
model_path=./models/wan2.2_models
lightx2v_path=/path/to/LightX2V
# Run script
cd LightX2V/scripts
cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh
bash wan22/run_wan22_moe_i2v_distill.sh
```
```
##### Gradio Interface Startup
> 💡 **Tip**: When there's only one model file in each subdirectory, LightX2V will automatically load it.
#### Scenario B: Multiple Model Files Per Directory
When performing inference through the Gradio interface, specify the model root directory path at startup:
When you place multiple models with different precisions in both `high_noise_model/` and `low_noise_model/` directories, you need to explicitly specify them in the configuration file.
> **Important Note**: Since the model root directory only contains quantized versions of each model, when using the frontend, the quantization precision for DIT/T5/CLIP models can only be selected as fp8. If you need to use non-quantized versions of T5/CLIP, please manually download non-quantized weights and place them in the gradio_demo model_path directory (`./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`). In this case, the T5/CLIP quantization precision can be set to bf16/fp16.
**Directory Structure**:
### Method 3: Manual Configuration
```
wan2.2_models_multi/
Users can flexibly configure quantization options and paths for each component according to actual needs, achieving mixed use of quantized and non-quantized components. Please ensure that the required model weights have been correctly downloaded and placed in the specified paths.
├── high_noise_model/
│ ├── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors # Original precision
> - **high_noise_quantized_ckpt** / **low_noise_quantized_ckpt**: Used to specify the path to quantized models (FP8/INT8), must be used with `dit_quantized` and `dit_quant_scheme` parameters
> - Wan2.2 models use a dual-noise architecture, requiring both high-noise and low-noise models to be downloaded
> - Refer to the "Wan2.2 Single-File Models" section above for detailed directory organization
---
## 🗂️ Format 3: LightX2V LoRA Models
LoRA (Low-Rank Adaptation) models provide a lightweight model fine-tuning solution that enables customization for specific effects without modifying the base model.
> - Quantized weights and original precision weights can be flexibly mixed and used, and the system will automatically select the corresponding model based on the configuration
> - The choice of quantization mode depends on your hardware support, it is recommended to use FP8 on high-end GPUs like H100/A100
> - Lightweight VAE can significantly improve inference speed but may slightly affect generation quality
## 💡 Best Practices
### Recommended Configurations
**Complete Model Users**:
- Download complete models to enjoy the convenience of automatic path discovery
- Only need to configure quantization schemes and component switches
- Recommended to use bash scripts for quick startup
| `alpha` | LoRA scaling factor, uses model's built-in value when `null`, defaults to 1 if no built-in value | null |
| `name` | (Wan2.2 only) Specifies which model to apply to | Required |
-**Use SSD Storage**: Significantly improve model loading speed and inference performance
**Advantages**:
-**Choose Appropriate Quantization Schemes**:
- ✅ Flexible switching between different LoRAs
- FP8: Suitable for high-end GPUs like H100/A100, high precision
- ✅ Saves storage space
- INT8: Suitable for general GPUs, small memory footprint
- ✅ Can dynamically adjust LoRA strength
-**Enable Lightweight VAE**: `use_tiny_vae: true` can improve inference speed
-**Reasonable CPU Offload Configuration**: `t5_cpu_offload: true` can save GPU memory
### Download Optimization Recommendations
**Disadvantages**:
- ❌ Additional loading time during inference
- ❌ Slightly increases memory usage
-**Use Hugging Face CLI**: More stable than git clone, supports resume download
---
-**Selective Download**: Only download required quantized versions, saving time and storage space
-**Network Optimization**: Use stable network connections, use proxy when necessary
-**Resume Download**: Use `--resume-download` parameter to support continuing download after interruption
## 🚨 Frequently Asked Questions
### Q: Model files are too large and download speed is slow, what should I do?
A: It is recommended to use selective download method, only download required quantized versions, or use domestic mirror sources
### Q: Model path does not exist when starting up?
A: Please check if the model has been correctly downloaded, verify if the path configuration is correct, and confirm if the automatic discovery mechanism is working properly
### Q: How to switch between different quantization schemes?
## 📚 Related Resources
A: Modify parameters such as `mm_type`, `t5_quant_scheme`, `clip_quant_scheme` in the configuration file, please refer to the [Quantization Documentation](../method_tutorials/quantization.md)
### Q: How to mix and use quantized and original precision components?
### Official Repositories
A: Control through `t5_quantized` and `clip_quantized` parameters, and manually specify original precision paths
Through scientific model file organization and flexible configuration options, LightX2V supports multiple usage scenarios. Complete model download provides maximum convenience, selective download saves storage space, and manual configuration provides maximum flexibility. The automatic path discovery mechanism ensures that users do not need to remember complex path configurations while maintaining system scalability.
Through this document, you should be able to:
✅ Understand all model formats supported by LightX2V
✅ Select appropriate models and precisions based on your needs
✅ Correctly download and organize model files
✅ Configure launch parameters and successfully run inference
✅ Resolve common model loading issues
If you have other questions, feel free to ask in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues).
Lightx2v implements a state-of-the-art parameter offloading mechanism specifically designed for efficient large model inference under limited hardware resources. This system provides excellent speed-memory balance through intelligent management of model weights across different memory hierarchies, enabling dynamic scheduling between GPU, CPU, and disk storage.
LightX2V implements an advanced parameter offload mechanism specifically designed for large model inference under limited hardware resources. The system provides an excellent speed-memory balance by intelligently managing model weights across different memory hierarchies.
**Core Features:**
**Core Features:**
-**Intelligent Granularity Management**: Supports both Block and Phase offloading granularities for flexible memory control
-**Block/Phase-level Offload**: Efficiently manages model weights in block/phase units for optimal memory usage
-**Block Granularity**: Complete Transformer layers as management units, containing self-attention, cross-attention, feed-forward networks, etc., suitable for memory-sufficient environments
-**Block**: The basic computational unit of Transformer models, containing complete Transformer layers (self-attention, cross-attention, feedforward networks, etc.), serving as a larger memory management unit
-**Phase Granularity**: Individual computational components as management units, providing finer-grained memory control for memory-constrained deployment scenarios
-**Phase**: Finer-grained computational stages within blocks, containing individual computational components (such as self-attention, cross-attention, feedforward networks, etc.), providing more precise memory control
-**Multi-level Storage Architecture**: GPU → CPU → Disk three-tier storage hierarchy with intelligent caching strategies
-**Multi-tier Storage Support**: GPU → CPU → Disk hierarchy with intelligent caching
-**Asynchronous Parallel Processing**: CUDA stream-based asynchronous computation and data transfer for maximum hardware utilization
-**Asynchronous Operations**: Overlaps computation and data transfer using CUDA streams
-**Persistent Storage Support**: SSD/NVMe disk storage support for ultra-large model inference deployment
-**Disk/NVMe Serialization**: Supports secondary storage when memory is insufficient
## 🎯 Offloading Strategy Details
## 🎯 Offload Strategies
### Strategy 1: GPU-CPU Granularity Offloading
### Strategy 1: GPU-CPU Block/Phase Offload
**Applicable Scenarios**: GPU VRAM insufficient but system memory resources adequate
**Use Case**: Insufficient GPU memory but sufficient system memory
**Technical Principle**: Establishes efficient weight scheduling mechanism between GPU and CPU memory, managing model weights in Block or Phase units. Leverages CUDA stream asynchronous capabilities to achieve parallel execution of computation and data transfer. Blocks contain complete Transformer layer structures, while Phases correspond to individual computational components within layers.
**How It Works**: Manages model weights in block or phase units between GPU and CPU memory, utilizing CUDA streams to overlap computation and data transfer. Blocks contain complete Transformer layers, while Phases are individual computational components within blocks.
**Granularity Selection Guide**:
-**Block Granularity**: Suitable for memory-sufficient environments, reduces management overhead and improves overall performance
-**Phase Granularity**: Suitable for memory-constrained environments, provides more flexible memory control and optimizes resource utilization
-**Block Granularity**: Larger memory management unit containing complete Transformer layers (self-attention, cross-attention, feedforward networks, etc.), suitable for sufficient memory scenarios with reduced management overhead
-**Phase Granularity**: Finer-grained memory management containing individual computational components (such as self-attention, cross-attention, feedforward networks, etc.), suitable for memory-constrained scenarios with more flexible memory control
**Key Features:**
-**Asynchronous Transfer**: Uses three CUDA streams with different priorities for parallel computation and transfer
- Compute stream (priority=-1): High priority, handles current computation
- GPU load stream (priority=0): Medium priority, handles CPU to GPU prefetching
- CPU load stream (priority=0): Medium priority, handles GPU to CPU offloading
-**Prefetch Mechanism**: Preloads the next block/phase to GPU in advance
-**Intelligent Caching**: Maintains weight cache in CPU memory
-**Stream Synchronization**: Ensures correctness of data transfer and computation
-**Swap Operation**: Rotates block/phase positions after computation for continuous execution
**Applicable Scenarios**: Both GPU VRAM and system memory resources insufficient in constrained environments
**Use Case**: Both GPU memory and system memory are insufficient
**How It Works**: Builds upon Strategy 1 by introducing disk storage, implementing a three-tier storage hierarchy (Disk → CPU → GPU). CPU continues to serve as a cache pool with configurable size, suitable for devices with limited CPU memory.
**Technical Principle**: Introduces disk storage layer on top of Strategy 1, constructing a Disk→CPU→GPU three-level storage architecture. CPU serves as a configurable intelligent cache pool, suitable for various memory-constrained deployment environments.
For memory-constrained devices, a progressive offload strategy is recommended:
1.**Step 1**: Only enable `cpu_offload`, disable `t5_cpu_offload`, `clip_cpu_offload`, `vae_cpu_offload`
2.**Step 2**: If memory is still insufficient, gradually enable CPU offload for T5, CLIP, VAE
3.**Step 3**: If memory is still not enough, consider using quantization + CPU offload or enable `lazy_load`
**Practical Experience**:
-**RTX 4090 24GB + 14B Model**: Usually only need to enable `cpu_offload`, manually set other component offload to `false`, and use FP8 quantized version
-**Smaller Memory GPUs**: Need to combine quantization, CPU offload, and lazy loading
-**Quantization Schemes**: Refer to [Quantization Documentation](../method_tutorials/quantization.md) to select appropriate quantization strategy
Detailed configuration files can be referenced at [Official Configuration Repository](https://github.com/ModelTC/lightx2v/tree/main/configs/offload)
## 🎯 Deployment Strategy Recommendations
**Configuration File Reference**:
-**Wan2.1 Series Models**: Refer to [offload config files](https://github.com/ModelTC/lightx2v/tree/main/configs/offload)
-**Wan2.2 Series Models**: Refer to [wan22 config files](https://github.com/ModelTC/lightx2v/tree/main/configs/wan22) with `4090` suffix
- 🔄 GPU-CPU Granularity Offloading: Suitable for insufficient GPU VRAM (RTX 3090/4090 24G) but adequate system memory (>64G)
## 🎯 Usage Recommendations
- Advantages: Balances performance and memory usage, suitable for medium-scale model inference
- 🔄 GPU-CPU Block/Phase Offload: Suitable for insufficient GPU memory (RTX 3090/4090 24G) but sufficient system memory (>64/128G)
- 💾 Disk-CPU-GPU Three-Level Offloading: Suitable for limited GPU VRAM (RTX 3060/4090 8G) and insufficient system memory (16-32G)
- 💾 Disk-CPU-GPU Block/Phase Offload: Suitable for both insufficient GPU memory (RTX 3060/4090 8G) and system memory (16/32G)
- Advantages: Supports ultra-large model inference with lowest hardware threshold
- 🚫 No Offload Mode: Suitable for high-end hardware configurations pursuing optimal inference performance
- 🚫 No Offload: Suitable for high-end hardware configurations pursuing best performance
- Advantages: Maximizes computational efficiency, suitable for latency-sensitive application scenarios
## 🔍 Troubleshooting and Solutions
### Common Performance Issues and Optimization Strategies
## 🔍 Troubleshooting
### Common Issues and Solutions
1.**Disk I/O Bottleneck**
- Solution: Use NVMe SSD, increase num_disk_workers
1.**Disk I/O Performance Bottleneck**
- Problem Symptoms: Slow model loading speed, high inference latency
- Solutions:
- Upgrade to NVMe SSD storage devices
- Increase num_disk_workers parameter value
- Optimize file system configuration
2.**Memory Buffer Overflow**
2.**Memory Buffer Overflow**
- Problem Symptoms: Insufficient system memory, program abnormal exit
- Solution: Increase max_memory or reduce num_disk_workers
- Solutions:
- Increase max_memory parameter value
3.**Loading Timeout**
- Decrease num_disk_workers parameter value
- Solution: Check disk performance, optimize file system
- Adjust offload_granularity to "phase"
3.**Model Loading Timeout**
**Note**: This offload mechanism is specifically designed for LightX2V, fully utilizing the asynchronous computing capabilities of modern hardware, significantly lowering the hardware threshold for large model inference.
- Problem Symptoms: Timeout errors during model loading process
- Solutions:
- Check disk read/write performance
- Optimize file system parameters
- Verify storage device health status
## 📚 Technical Summary
Lightx2v's offloading mechanism is specifically designed for modern AI inference scenarios, fully leveraging GPU's asynchronous computing capabilities and multi-level storage architecture advantages. Through intelligent weight management and efficient parallel processing, this mechanism significantly reduces the hardware threshold for large model inference, providing developers with flexible and efficient deployment solutions.
LightX2V supports quantization inference for linear layers in `Dit`, supporting `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8`, and `w4a4-nvfp4` matrix multiplication. Additionally, LightX2V also supports quantization of T5 and CLIP encoders to further improve inference performance.
## 📖 Overview
## 📊 Quantization Scheme Overview
LightX2V supports quantized inference for DIT, T5, and CLIP models, reducing memory usage and improving inference speed by lowering model precision.
### DIT Model Quantization
---
LightX2V supports multiple DIT matrix multiplication quantization schemes, configured through the `mm_type` parameter:
Download quantized models from the [LightX2V Official Model Repository](https://huggingface.co/lightx2v), refer to the [Model Structure Documentation](../deploy_guides/model_structure.md) for details.
For detailed quantization tool usage, refer to: [Model Conversion Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
Use LightX2V's convert tool to convert models into quantized models. Refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
---
## 📥 Loading Quantized Models for Inference
## 🚀 Using Quantized Models
### DIT Model Configuration
### DIT Model Quantization
Write the path of the converted quantized weights to the `dit_quantized_ckpt` field in the [configuration file](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization).
By specifying `--config_json` to the specific config file, you can load the quantized model for inference.
> 💡 **Tip**: When a T5 quantized model exists in the script's specified `model_path` (such as `models_t5_umt5-xxl-enc-fp8.pth` or `models_t5_umt5-xxl-enc-int8.pth`), `t5_quantized_ckpt` doesn't need to be specified separately.
[Here](https://github.com/ModelTC/lightx2v/tree/main/scripts/quantization) are some running scripts for use.
> 💡 **Tip**: When a CLIP quantized model exists in the script's specified `model_path` (such as `models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth` or `models_clip_open-clip-xlm-roberta-large-vit-huge-14-int8.pth`), `clip_quantized_ckpt` doesn't need to be specified separately.
For details, please refer to the documentation of the quantization tool [LLMC](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md)
### Performance Optimization Strategy
### Custom Quantization Kernels
If memory is insufficient, you can combine parameter offloading to further reduce memory usage. Refer to [Parameter Offload Documentation](../method_tutorials/offload.md):
LightX2V supports custom quantization kernels that can be extended in the following ways:
> - **Wan2.1 Configuration**: Refer to [offload config files](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
> - **Wan2.2 Configuration**: Refer to [wan22 config files](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) with `4090` suffix
1.**Register New mm_type**: Add new quantization classes in `mm_weight.py`
---
2.**Implement Quantization Functions**: Define quantization methods for weights and activations
3.**Integrate Compute Kernels**: Use custom matrix multiplication implementations