Merge pull request #702 from Azure-Tang/update-readme

[UPDATE] Update documents.

Merge pull request #702 from Azure-Tang/update-readme
[UPDATE] Update documents.
ca93cf75 · Azure · GitHub · 3ebe17eb · c05ebb74 · ca93cf75
Unverified Commit ca93cf75 authored Feb 26, 2025 by Azure Committed by GitHub Feb 26, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 10 deletions

doc/en/fp8_kernel.md doc/en/fp8_kernel.md +11 -9

doc/en/install.md doc/en/install.md +1 -1

No files found.
--- a/doc/en/fp8_kernel.md
+++ b/doc/en/fp8_kernel.md
@@ -10,15 +10,17 @@ The DeepSeek-AI team provides FP8 safetensors for DeepSeek-R1/V3 models. We achi
 So those who are persuing the best performance can use the FP8 linear kernel for DeepSeek-V3/R1.

 ## Key Features
-✅ Hybrid Precision Architecture (FP8 + GGML)
+
+✅ Hybrid Precision Architecture (FP8 + GGML)<br>
 ✅ Memory Optimization (~19GB VRAM usage)

 ## Quick Start
 ### Using Pre-Merged Weights

-Pre-merged weights are available on Hugging Face:
-[KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-V3)
+Pre-merged weights are available on Hugging Face:<br>
+[KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-V3)<br>
 [KVCache-ai/DeepSeek-R1-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-R1)
+
 > Please confirm the weights are fully uploaded before downloading. The large file size may extend Hugging Face upload time.


@@ -32,12 +34,12 @@ pip install -U huggingface_hub
 huggingface-cli download --resume-download KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid --local-dir <local_dir>
 ```
 ### Using merge scripts
-If you got local DeepSeek-R1/V3 fp8 safetensors and q4km gguf weights, you can merge them using the following scripts.
+If you got local DeepSeek-R1/V3 fp8 safetensors and gguf weights(eg.q4km), you can merge them using the following scripts.

 ```shell
-python convert_model.py \
+python merge_tensors/merge_safetensor_gguf.py \
  --safetensor_path <fp8_safetensor_path> \
-  --gguf_path <q4km_gguf_folder_path> \
+  --gguf_path <gguf_folder_path> \
  --output_path <merged_output_path>
 ```

@@ -60,15 +62,15 @@ python ktransformers/local_chat.py \

 ## Notes

-⚠️ Hardware Requirements
+⚠️ Hardware Requirements<br>
 * Recommended minimum 19GB available VRAM for FP8 kernel.
 * Requires GPU with FP8 support (e.g., 4090)

 ⏳ First-Run Optimization
 JIT compilation causes longer initial execution (subsequent runs retain optimized speed).

-🔄 Temporary Interface
+🔄 Temporary Interface<br>
 Current weight loading implementation is provisional - will be refined in future versions

-📁 Path Specification
+📁 Path Specification<br>
 Despite hybrid quantization, merged weights are stored as .safetensors - pass the containing folder path to `--gguf_path`
\ No newline at end of file
--- a/doc/en/install.md
+++ b/doc/en/install.md
@@ -121,7 +121,7 @@ We provide a simple command-line local chat Python script that you can run for t
 mkdir DeepSeek-V2-Lite-Chat-GGUF
 cd DeepSeek-V2-Lite-Chat-GGUF

-wget https://huggingface.co/mzwing/DeepSeek-V2-Lite-Chat-GGUF/resolve/main/DeepSeek-V2-Lite-Chat.Q4_K_M.gguf -O DeepSeek-V2-Lite-Chat.Q4_K_M.gguf
+wget https://huggingface.co/mradermacher/DeepSeek-V2-Lite-GGUF/resolve/main/DeepSeek-V2-Lite.Q4_K_M.gguf -O DeepSeek-V2-Lite-Chat.Q4_K_M.gguf

 cd .. # Move to repo's root dir