A crate that listens to the microphone

contigen · February 16, 2023, 6:32am

A beginner in Rust here. I'm trying to listen to the microphone, get the input and output, with Rust, something like the Web Speech API in JS. I found the soundio & portaudio crates, couldn't get them to work, they are not even properly documented; I found nothing solid so far. Is there something else?

parasyte · February 16, 2023, 7:39am

I recently worked on something like this. For wasm targets, you can actually use the Web Speech API. It's a little wonky because it's only supported in Chrome-based browsers with the webkit prefix, but it can work after jumping through some hoops.

For native apps, I had some success with vosk for speech recognition (speech-to-text) and I used cpal for the audio capture.

I also evaluated coqui-stt for speech recognition, but the transcription quality seemed poor for my use case.

As part of the evaluation, I ended up with a list of subjective pros and cons for these options:

vosk
- Pros:
  - Nice thread-safe API with internal reference counting (no lifetime issues to deal with).
  - The language models are small (about 50 MB total).
  - Transcription accuracy is pretty good!
  - Totally offline.
  - Recording sample rate is configurable.
  - Free, open source, MIT + Apache 2.0.
- Cons:
  - Requires dynamic linking (WASM cannot be supported directly).
  - macOS pre-built library is older than Linux and Windows builds (not auto built by CI).
coqui-stt
- Pros:
  - The acoustic language models are small (about 50 MB total).
  - Totally offline.
  - Free, open source, MPL 2.0.
  - Pre-built libraries are all available for macOS, Linux, and Windows.
- Cons:
  - Requires dynamic linking (WASM cannot be supported directly).
  - The language models are only trained for 16 KHz sample rate (requires resampling audio streams).
  - Transcription quality is very poor, even with the 1 GB external scorer.
  - Lifetime issues with streaming API. Requires leaking to workaround: Stream API issues with threads · Issue #17 · tazz4843/coqui-stt · GitHub

For WASM (browser) targets we can use the Web Speech API.

Pros:
- Available in most browsers.
- No external build or runtime dependencies.
- Transcription accuracy is VERY good.
Cons:
- Does not work in Firefox.
- Offline transcription is not supported.

There is also picovoice but I haven't tried it because their access key policy was not acceptable for us.

mathildebuxton · February 16, 2023, 4:43pm

Another option is to explore alternative audio processing libraries that are compatible with Rust. For example, you could try exploring the cpal crate, which provides a simple, cross-platform audio API for Rust. It supports capturing audio input from microphones and playing audio output through speakers or other audio devices.

contigen · February 16, 2023, 8:15pm

Thanks a lot parasyte, I'll definitely check 'em out.
I worked with the Web speech API in JS; you could check it out: GitHub - contigen/speak-notes

system · May 17, 2023, 8:15pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Get native microphone input - Audio Engine? help	9	915	August 25, 2024
Extract speech from video and audio files help	1	838	December 18, 2022
Interest in (or advise for) a voice-recognition-based computer-controll app?	1	598	July 7, 2022
Using rust for audio output help	4	7686	January 12, 2023
How to convert text to voice help	4	5243	July 17, 2020

A crate that listens to the microphone

Related topics