A crate that listens to the microphone

A beginner in Rust here. I'm trying to listen to the microphone, get the input and output, with Rust, something like the Web Speech API in JS. I found the soundio & portaudio crates, couldn't get them to work, they are not even properly documented; I found nothing solid so far. Is there something else?

I recently worked on something like this. For wasm targets, you can actually use the Web Speech API. It's a little wonky because it's only supported in Chrome-based browsers with the webkit prefix, but it can work after jumping through some hoops.

For native apps, I had some success with vosk for speech recognition (speech-to-text) and I used cpal for the audio capture.

I also evaluated coqui-stt for speech recognition, but the transcription quality seemed poor for my use case.

As part of the evaluation, I ended up with a list of subjective pros and cons for these options:


  1. vosk
    • Pros:
      • Nice thread-safe API with internal reference counting (no lifetime issues to deal with).
      • The language models are small (about 50 MB total).
      • Transcription accuracy is pretty good!
      • Totally offline.
      • Recording sample rate is configurable.
      • Free, open source, MIT + Apache 2.0.
    • Cons:
      • Requires dynamic linking (WASM cannot be supported directly).
      • macOS pre-built library is older than Linux and Windows builds (not auto built by CI).
  2. coqui-stt
    • Pros:
      • The acoustic language models are small (about 50 MB total).
      • Totally offline.
      • Free, open source, MPL 2.0.
      • Pre-built libraries are all available for macOS, Linux, and Windows.
    • Cons:
      • Requires dynamic linking (WASM cannot be supported directly).
      • The language models are only trained for 16 KHz sample rate (requires resampling audio streams).
      • Transcription quality is very poor, even with the 1 GB external scorer.
      • Lifetime issues with streaming API. Requires leaking to workaround: Stream API issues with threads · Issue #17 · tazz4843/coqui-stt · GitHub

For WASM (browser) targets we can use the Web Speech API.

  • Pros:
    • Available in most browsers.
    • No external build or runtime dependencies.
    • Transcription accuracy is VERY good.
  • Cons:
    • Does not work in Firefox.
    • Offline transcription is not supported.

There is also picovoice but I haven't tried it because their access key policy was not acceptable for us.

2 Likes

Another option is to explore alternative audio processing libraries that are compatible with Rust. For example, you could try exploring the cpal crate, which provides a simple, cross-platform audio API for Rust. It supports capturing audio input from microphones and playing audio output through speakers or other audio devices.

1 Like

Thanks a lot parasyte, I'll definitely check 'em out.
I worked with the Web speech API in JS; you could check it out: GitHub - contigen/speak-notes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.