Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ktransformers
Commits
c05ebb74
Commit
c05ebb74
authored
Feb 26, 2025
by
Azure
Browse files
Update fp8 doc; Update install.md broken link
parent
bb6920ed
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
12 additions
and
10 deletions
+12
-10
doc/en/fp8_kernel.md
doc/en/fp8_kernel.md
+11
-9
doc/en/install.md
doc/en/install.md
+1
-1
No files found.
doc/en/fp8_kernel.md
View file @
c05ebb74
...
...
@@ -10,15 +10,17 @@ The DeepSeek-AI team provides FP8 safetensors for DeepSeek-R1/V3 models. We achi
So those who are persuing the best performance can use the FP8 linear kernel for DeepSeek-V3/R1.
## Key Features
✅ Hybrid Precision Architecture (FP8 + GGML)
✅ Hybrid Precision Architecture (FP8 + GGML)
<br>
✅ Memory Optimization (~19GB VRAM usage)
## Quick Start
### Using Pre-Merged Weights
Pre-merged weights are available on Hugging Face:
[
KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid
](
https://huggingface.co/KVCache-ai/DeepSeek-V3
)
Pre-merged weights are available on Hugging Face:
<br>
[
KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid
](
https://huggingface.co/KVCache-ai/DeepSeek-V3
)
<br>
[
KVCache-ai/DeepSeek-R1-GGML-FP8-Hybrid
](
https://huggingface.co/KVCache-ai/DeepSeek-R1
)
> Please confirm the weights are fully uploaded before downloading. The large file size may extend Hugging Face upload time.
...
...
@@ -32,12 +34,12 @@ pip install -U huggingface_hub
huggingface-cli download
--resume-download
KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid
--local-dir
<local_dir>
```
### Using merge scripts
If you got local DeepSeek-R1/V3 fp8 safetensors and
q4km
gguf weights, you can merge them using the following scripts.
If you got local DeepSeek-R1/V3 fp8 safetensors and gguf weights
(eg.q4km)
, you can merge them using the following scripts.
```
shell
python
convert_model
.py
\
python
merge_tensors/merge_safetensor_gguf
.py
\
--safetensor_path
<fp8_safetensor_path>
\
--gguf_path
<
q4km_
gguf_folder_path>
\
--gguf_path
<gguf_folder_path>
\
--output_path
<merged_output_path>
```
...
...
@@ -60,15 +62,15 @@ python ktransformers/local_chat.py \
## Notes
⚠️ Hardware Requirements
⚠️ Hardware Requirements
<br>
*
Recommended minimum 19GB available VRAM for FP8 kernel.
*
Requires GPU with FP8 support (e.g., 4090)
⏳ First-Run Optimization
JIT compilation causes longer initial execution (subsequent runs retain optimized speed).
🔄 Temporary Interface
🔄 Temporary Interface
<br>
Current weight loading implementation is provisional - will be refined in future versions
📁 Path Specification
📁 Path Specification
<br>
Despite hybrid quantization, merged weights are stored as .safetensors - pass the containing folder path to
`--gguf_path`
\ No newline at end of file
doc/en/install.md
View file @
c05ebb74
...
...
@@ -121,7 +121,7 @@ We provide a simple command-line local chat Python script that you can run for t
mkdir
DeepSeek-V2-Lite-Chat-GGUF
cd
DeepSeek-V2-Lite-Chat-GGUF
wget https://huggingface.co/m
zwing
/DeepSeek-V2-Lite-
Chat-
GGUF/resolve/main/DeepSeek-V2-Lite
-Chat
.Q4_K_M.gguf
-O
DeepSeek-V2-Lite-Chat.Q4_K_M.gguf
wget https://huggingface.co/m
radermacher
/DeepSeek-V2-Lite-GGUF/resolve/main/DeepSeek-V2-Lite.Q4_K_M.gguf
-O
DeepSeek-V2-Lite-Chat.Q4_K_M.gguf
cd
..
# Move to repo's root dir
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment