Merge pull request #53 from MoonshotAI/xyf_kimiaudio

update README

Merge pull request #53 from MoonshotAI/xyf_kimiaudio
update README
27cc8d46 · xinyifei99 · GitHub · bdb44413 · 58d79f51 · 27cc8d46
Unverified Commit 27cc8d46 authored Apr 29, 2025 by xinyifei99 Committed by GitHub Apr 29, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 0 deletions

README.md README.md +9 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -57,7 +57,16 @@ Kimi-Audio consists of three main components:
 2.  **Audio LLM:** A transformer-based model (initialized from a pre-trained text LLM like Qwen 2.5 7B) with shared layers processing multimodal inputs, followed by parallel heads for autoregressively generating text tokens and discrete audio semantic tokens.
 3.  **Audio Detokenizer:** Converts the predicted discrete semantic audio tokens back into high-fidelity waveforms using a flow-matching model and a vocoder (BigVGAN), supporting chunk-wise streaming with a look-ahead mechanism for low latency.

+## Getting Started

+### Step1: Get the Code
+
+```bash
+git clone https://github.com/MoonshotAI/Kimi-Audio.git
+cd Kimi-Audio
+git submodule update --init --recursive
+pip install -r requirements.txt
+```

 ## Quick Start