Get native microphone input - Audio Engine?

Dear Rustaceans, I am thinking about how to get my default microphone input into rust code. What can I use? How does it work?

I would like to set a fundament for voice to text recognition. :slight_smile:

I allready found Cpal & SoundIO. Im looking to learn more about them!

Thank you my Rustaceans in advance!

Best regards


1 Like

Which OS? Each OS has different api's for getting mic input.

1 Like

Should be Windows and Arduino / Raspberry!

For Windows I'm not familiar with the required api's. You can probably look for a C example and then use the windows-sys crate to call the same api's.

As for Linux (raspberry pi) you did have to use the PulseAudio api for recording. libpulse-binding seems to be a faily popular crate for binding against libpulse. Here too I would recommend looking for a C example to get an overview of what parts of the api you need to interact with.

And finally for Arduino it will depend on whatever hat you are using for plugging your microphone in. It likely has an associated C++ library you need to use. In any case an Arduino will not be fast enough for speech recognition.

1 Like

I would love to go 100% Rust with that. What could be a way to get that thing going?

I have used vosk and coqui-stt with cpal in a simple transcription app. Both worked fine, but vosk was a little more user friendly and provided higher quality transcriptions.

The disappointing thing is that neither of these TTS crates are "pure Rust", they have C/C++ components. But they were the best available local-only models that I was able to find at the time.

1 Like

What @bjorn3 is saying is to first look at how it's done in C and then use the windows-sys crate to do the same thing but in Rust.

1 Like

Is there a specific method for developers how to rewrite x in Rust?

Thank you guys so much! Your information is invaluable!

Not a specific method in general.

For this though it would involve finding a project's source code, reading (some of) the code, identifying what code is responsible for getting the default microphone input, identify which syscalls were used to achieve that, and finally finding the API for those syscalls in the win-sys crate.

It's much easier to do if you can read C and know what a syscall is.

If not, you might have better luck googling for "syscall to get default microphone input on windows" or something like that, then you can skip to finding the API in win-sys.