mirror of
https://github.com/kyutai-labs/delayed-streams-modeling.git
synced 2025-12-23 03:19:57 +00:00
Replace "pre-print coming soon"
This commit is contained in:
@@ -3,12 +3,10 @@
|
|||||||
This repo contains instructions and examples of how to run
|
This repo contains instructions and examples of how to run
|
||||||
[Kyutai Speech-To-Text](#kyutai-speech-to-text)
|
[Kyutai Speech-To-Text](#kyutai-speech-to-text)
|
||||||
and [Kyutai Text-To-Speech](#kyutai-text-to-speech) models.
|
and [Kyutai Text-To-Speech](#kyutai-text-to-speech) models.
|
||||||
These models are powered by delayed streams modeling (DSM),
|
|
||||||
a flexible formulation for streaming, multimodal sequence-to-sequence learning.
|
|
||||||
See also [Unmute](https://github.com/kyutai-labs/unmute), an voice AI system built using Kyutai STT and Kyutai TTS.
|
See also [Unmute](https://github.com/kyutai-labs/unmute), an voice AI system built using Kyutai STT and Kyutai TTS.
|
||||||
|
|
||||||
But wait, what is "Delayed Streams Modeling"? It is a technique for solving many streaming X-to-Y tasks (with X, Y in `{speech, text}`)
|
But wait, what is "Delayed Streams Modeling"? It is a technique for solving many streaming X-to-Y tasks (with X, Y in `{speech, text}`)
|
||||||
that formalize the approach we had with Moshi and Hibiki. A pre-print paper is coming soon!
|
that formalize the approach we had with Moshi and Hibiki. See our [pre-print about DSM](https://arxiv.org/abs/2509.08753).
|
||||||
|
|
||||||
## Kyutai Speech-To-Text
|
## Kyutai Speech-To-Text
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user