Commit 39ac40a9 authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2747 failed with stages
in 0 seconds
This diff is collapsed.
# 模型编码
modelCode=1582
# 模型名称
modelName=VITA-Audio_pytorch
# 模型描述
modelDescription=在生成首个音频片段时大幅提升响应速度,解决实时语音关键瓶颈,整体推理速度相比同规模模型提升3–5倍。
# 应用场景
appScenario=推理,语音合成,广媒,影视,动漫,医疗,家居,教育
# 框架类型
frameType=pytorch
# GLM-4-Voice-Decoder
GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音,进行实时语音对话,并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。
GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions.
本仓库是 GLM-4-Voice 的 speech decoder 部分。GLM-4-Voice-Decoder 是基于 [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) 重新训练的支持流式推理的语音解码器,将离散化的语音 token 转化为连续的语音输出。最少只需要 10 个音频 token 即可开始生成,降低对话延迟。
The repo provides the speech decoder of GLM-4-Voice. GLM-4-Voice-Decoder is a speech decoder supporting streaming inference, retrained based on [CosyVoice](https://github.com/FunAudioLLM/CosyVoice), converting discrete speech tokens into continuous speech output. Generation can start with as few as 10 audio tokens, reducing conversation latency.
更多信息请参考我们的仓库 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
For more information please refer to our repo [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
\ No newline at end of file
# GLM-4-Voice-Tokenizer
GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音,进行实时语音对话,并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。
GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions.
本仓库是 GLM-4-Voice 的 speech tokenizer 部分。通过在 [Whisper](https://github.com/openai/whisper) 的 encoder 部分增加 vector quantization 进行训练,将连续的语音输入转化为离散的 token。每秒音频转化为 12.5 个离散 token。
The repo provides the speech tokenzier of GLM-4-Voice, which is trained by adding vector quantization to the encoder part of [Whisper](https://github.com/openai/whisper) and converts continuous speech input into discrete tokens. Each second of audio is converted into 12.5 discrete tokens.
更多信息请参考我们的仓库 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
For more information please refer to our repo [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
\ No newline at end of file
File added
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment