Interest in (or advise for) a voice-recognition-based computer-controll app?

I've been thinking (due to some wrist pain) of writing a little voice programming app (sort of like dragonfly for python). I've used dragonfly in the past, but have been frustrated by the lack of static typing and the resulting bitrot that inevitably happens. It's a little silly to be writing a new app in rust for this, but it's also appealing to have something that I can feel more confident with, and I imagine that others here might be in a similar boat.

I would love any advice y'all might have about this possible task. If there is a project already started, that would be awesome. Otherwise, ideas for how to accomplish the tasks below would be greatly appreciated, or ideas for other crates (e.g. that could be useful for command grammar).

Speech recognition It looks like coqui-stt might be a good option for the actual audio-to-text task. I have no idea how well it works, since I haven't used it yet.

Audio input It looks like coquii requires another library to provide the actual sound input data, and it looks like cpal might be appropriate for that.

Sending keystrokes It seems like for Linux (my primary platform) input (a binding to libinput) might be my best choice. It may also be able to control the mouse, but that seems less essential.

Interacting with the window manager This is not necessary, but is convenient for making the voice commands depend on which application has the keyboard focus. It seems like this might be hard, but under gnome it seems dbus could be used, which might be done with the dbus crate.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.