update readme

440d827e · TangJingqi · abd4214b · 440d827e · 440d827e · 440d827e
Commit 440d827e authored Aug 29, 2024 by TangJingqi
Showing with 320 additions and 279 deletions

README.md README.md +2 -2

doc/en/long_context_introduction.md doc/en/long_context_introduction.md +316 -0

doc/en/long_context_tutorial.md doc/en/long_context_tutorial.md +2 -277

No files found.
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin
 <h2 id="Updates">🔥 Updates</h2>
-* **Aug 28, 2024**: Support 1M context under the InternLM2.5-7B-Chat-1M model, utilizing 24GB of VRAM and 150GB of DRAM.
+* **Aug 28, 2024**: Support 1M context under the InternLM2.5-7B-Chat-1M model, utilizing 24GB of VRAM and 150GB of DRAM. The detailed tutorial is [here](./doc/en/long_context_tutorial.md).
 * **Aug 28, 2024**: Decrease DeepseekV2's required VRAM from 21G to 11G.
 * **Aug 15, 2024**: Update detailed [TUTORIAL](doc/en/injection_tutorial.md) for injection and multi-GPU. 
 * **Aug 14, 2024**: Support llamfile as linear backend. 
@@ -52,7 +52,7 @@ https://github.com/user-attachments/assets/a865e5e4-bca3-401e-94b8-af3c080e6c12
 * **Enhanced Speed**: Reaches 16.91 tokens/s for generation with a 1M context using sparse attention, powered by llamafile kernels. This method is over 10 times faster than full attention approach of llama.cpp.
-* **Flexible Sparse Attention Framework**: Offers a flexible block sparse attention framework for CPU offloaded decoding. Compatible with SnapKV, Quest, and InfLLm. Further information is available [here](./doc/en/long_context_tutorial.md).
+* **Flexible Sparse Attention Framework**: Offers a flexible block sparse attention framework for CPU offloaded decoding. Compatible with SnapKV, Quest, and InfLLm. Further information is available [here](./doc/en/long_context_introduction.md).
 <div>
 <h3>GPT-4-level Local VSCode Copilot on a Desktop with only 24GB VRAM</h3>

--- a/doc/en/long_context_introduction.md
+++ b/doc/en/long_context_introduction.md
--- a/doc/en/long_context_tutorial.md
+++ b/doc/en/long_context_tutorial.md