Moving over STT inference scripts (#7)

* Adding links to STT example scripts One script for HF dataset inference; another for retrieving timestamps. * Moving inference scripts to the delayed-streams-repo --------- Co-authored-by: Eugene <eugene@kyutai.org>
2025-12-23 03:19:57 +00:00 · 2025-06-20 15:53:45 +02:00
parent dd5cbcbeef
commit ef864a6f38
3 changed files with 643 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -60,6 +60,22 @@ uvx --with moshi python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria
 ```
 It will install the moshi package in a temporary environment and run the speech-to-text.

+Additionally, we provide two scripts that highlight different usage scenarios. The first script illustrates how to extract word-level timestamps from the model's outputs:
+
+```bash
+uv run \
+  scripts/streaming_stt_timestamps.py \
+  --hf-repo kyutai/stt-2.6b-en \
+  --file bria.mp3
+```
+
+The second script can be used to run a model on an existing Hugging Face dataset and calculate its performance metrics: 
+```bash
+uv run scripts/streaming_stt.py  \
+  --dataset meanwhile  \
+  --hf-repo kyutai/stt-2.6b-en
+```
+
 ### Rust server
 <a href="https://huggingface.co/kyutai/stt-2.6b-en-candle" target="_blank" style="margin: 2px;">
    <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/>