Add the rust indications.

2026-01-04 08:22:56 +00:00 · 2025-06-17 08:37:44 +02:00
parent e53c94d6b6
commit 581e1d338b
3 changed files with 217 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -3,18 +3,58 @@ Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimod

 ## Speech To Text

-### PyTorch implementation
+### Leaderboard model
+The leaderboard model handles english only, it has ~2.6B parameters.
+
+#### PyTorch implementation
+[[Hugging Face]](https://huggingface.co/kyutai/stt)

 ```bash
-python -m moshi.run_inference --hf-repo kyutai/stt input.mp3
+# wget https://github.com/metavoiceio/metavoice-src/raw/main/assets/bria.mp3
+python -m moshi.run_inference --hf-repo kyutai/stt bria.mp3
 ```

-### MLX implementation
+#### MLX implementation
+[[Hugging Face]](https://huggingface.co/kyutai/stt-mlx)

 ```bash
-python -m moshi_mlx.run_inference --hf-repo kyutai/stt-mlx ~/tmp/bria-24khz.mp3 --temp 0
+# wget https://github.com/metavoiceio/metavoice-src/raw/main/assets/bria.mp3
+python -m moshi_mlx.run_inference --hf-repo kyutai/stt-mlx bria.mp3 --temp 0
 ```

+#### Rust implementation
+[[Hugging Face]](https://huggingface.co/kyutai/stt-candle)
+
+The Rust implementation provides a server that can process multiple streaming
+queries in parallel. Dependening on the amount of memory on your GPU, you may
+have to adjust the batch size from the config file. For a L40S GPU, a batch size
+of 64 works well.
+
+In order to run the server, install the `moshi-server` crate via the following
+command. The server code can be found in the
+[kyutai-labs/moshi](https://github.com/kyutai-labs/moshi/tree/main/rust/moshi-server)
+repository.
+```bash
+cargo install --features cuda moshi-server
+```
+
+Then the server can be started via the following command using the config file
+from this repository.
+```bash
+moshi-server worker --config configs/config-stt-hf.toml
+```
+
+Once the server has started you can run a streaming inference with the following
+script.
+```bash
+# wget https://github.com/metavoiceio/metavoice-src/raw/main/assets/bria.mp3
+uv run scripts/asr-streaming-query.py bria.mp3
+```
+
+The script simulates some real-time processing of the audio. Faster processing
+can be triggered by setting the real-time factor, e.g. `--rtf 500` will process
+the data as fast as possible.
+
 ## License

 The present code is provided under the MIT license for the Python parts, and Apache license for the Rust backend.