A beginner in Rust here. I'm trying to listen to the microphone, get the input and output, with Rust, something like the Web Speech API in JS. I found the soundio & portaudio crates, couldn't get them to work, they are not even properly documented; I found nothing solid so far. Is there something else?
I recently worked on something like this. For wasm
targets, you can actually use the Web Speech API. It's a little wonky because it's only supported in Chrome-based browsers with the webkit
prefix, but it can work after jumping through some hoops.
For native apps, I had some success with vosk
for speech recognition (speech-to-text) and I used cpal
for the audio capture.
I also evaluated coqui-stt
for speech recognition, but the transcription quality seemed poor for my use case.
As part of the evaluation, I ended up with a list of subjective pros and cons for these options:
-
vosk
- Pros:
- Nice thread-safe API with internal reference counting (no lifetime issues to deal with).
- The language models are small (about 50 MB total).
- Transcription accuracy is pretty good!
- Totally offline.
- Recording sample rate is configurable.
- Free, open source, MIT + Apache 2.0.
- Cons:
- Requires dynamic linking (WASM cannot be supported directly).
- macOS pre-built library is older than Linux and Windows builds (not auto built by CI).
- Pros:
-
coqui-stt
- Pros:
- The acoustic language models are small (about 50 MB total).
- Totally offline.
- Free, open source, MPL 2.0.
- Pre-built libraries are all available for macOS, Linux, and Windows.
- Cons:
- Requires dynamic linking (WASM cannot be supported directly).
- The language models are only trained for 16 KHz sample rate (requires resampling audio streams).
- Transcription quality is very poor, even with the 1 GB external scorer.
- Lifetime issues with streaming API. Requires leaking to workaround: Stream API issues with threads · Issue #17 · tazz4843/coqui-stt · GitHub
- Pros:
For WASM (browser) targets we can use the Web Speech API.
- Pros:
- Available in most browsers.
- No external build or runtime dependencies.
- Transcription accuracy is VERY good.
- Cons:
- Does not work in Firefox.
- Offline transcription is not supported.
There is also picovoice
but I haven't tried it because their access key policy was not acceptable for us.
Another option is to explore alternative audio processing libraries that are compatible with Rust. For example, you could try exploring the cpal crate, which provides a simple, cross-platform audio API for Rust. It supports capturing audio input from microphones and playing audio output through speakers or other audio devices.
Thanks a lot parasyte, I'll definitely check 'em out.
I worked with the Web speech API in JS; you could check it out: GitHub - contigen/speak-notes
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.