Extract speech from video and audio files

Hello everyone

This is not Rust specific topic, other than I want to use it from a Rust program.

Is there any Rust-accessible library (ideally a Rust crate) that can be used to take input video file (i.e. video.mp4) or input audio file and to extract the audio voice/speech content from the file, in form of Text output?
Or even a tool/utility that can do this, from command line or CLI ?

I want to be able to analyze, index and search for words/phrases said in a video and audio files.

I just Googled for answers and the only thing I found on Cloud was Microsoft Azure Cognitive Services API. I have not tried it (could be great!!) yet but perhaps there are more options to investigate - such as AWS cloud, Google GCP cloud or C/C++ libraries and Rust crates?

Also this: Picovoice (github.com)

Any more ideas? Or perhaps someone has used something else?

Many thanks

Thank you.